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[57] ABSTRACT 

Disclosed is a system for backing up files in a distributed 
computing system. A file server maintains files in a shared 
name space. The file server provides a first backup client 
program and a second backup client program with access to 
the files in the shared name space. The first backup client 
program initiates a backup request to backup a requested 
file. A determination is made as to whether the requested file 
is maintained in a shared name space. The backup request is 
transmitted to the second backup client program upon deter- 
mining that the requested file is maintained in the shared 
name space. The second backup client program transmits a 
message to the file server to provide the requested file. The 
file server transmits the requested file with the file server to 
the second backup client program. The second backup client 
program then transmits the requested file to a backup server 
program. The backup server program stores the requested 
file in a storage device. 

24 Claims, 3 Drawing Sheets 
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SYSTEM INCLUDING A PROXY CLIENT TO 
BACKUP FILES IN A DISTRIBUTED 
COMPUTING ENVIRONMENT 

FIELD OF THE INVENTION 

Preferred embodiments of the present invention relate to 
a system for backing up files in a distributed computing 
system and, in particular, using a proxy client to backup 
files. 

BACKGROUND OF THE RELATED ART 

In a distributed computing system, different computers, 
operating systems, and networks interact as if they were all 
part of a single system. The file system has a single set of 
global file names. A particular machine in the system need 
not know where the file is physically located. Instead, the file 
may be accessed anywhere in the network using the global 
file name. Global file names are part of the shared name 
space which devices within the distributed file system may 
access. One such distributed file system is the Andrew File 
System (AFS) available through Transarc, Corporation 
("Transarc"). An AFS server performs file mapping between 
the directory name of a file and the location, making the file 
space location independent. With file independence, a user at 
a workstation linked to the network need only know the 
global file name, which includes the path name, and not the 
physical location where the file resides. 

Another distributed system, is the Distributed File System 
(DFS), available from Transarc and International Business 
Machines, Corp. (IBM), which is a component of the 
Distributed Computing Environment (DCE) standard pro- 
mulgated by the Open Software Foundation (OSF). IBM is 
the assignee of the subject patent application. The DFS and 
AFS systems allow users to access data throughout the 
network. Any changes made by one user to a file is available 
to all users. The DFS and AFS systems include security 
services that provide authentication to limit access to autho- 
rized users. 

The AFS system offered by Transarc includes a backup 
program called "butc" (Backup Tape Coordinator). Bute is a 
volume backup system used to dump volume images to tape 
devices attached to the file server. However, the minimum 
backup unit for the butc program is a volume as the butc 
program does not provide support for file-level backup and 
recovery. 

Hierarchical storage management programs, such as the 
IBM Adstar Distributed Storage Management (ADSM) 
product, provide backup/archive support and migrates less 
frequently used files to backup storage to free space. The 
ADSM server provides hierarchical storage management 
backing files up on tape drives, optical disks, and other 
storage medium. The ADSM backup feature saves copies of 
files from a client computer to a storage space managed by 
an ADSM server. Thus, data at a client computer running an 
ADSM client is protected in the event of data loss due to a 
hardware or software failure, accidental deletion, and/or 
logical corruption. With the ADSM program, clients can 
backup volumes, directories, subdirectories or files. ADSM 
allows incremental backup of only those files that have been 
changed. In this way, ADSM avoids the need to do a full 
dump to backup as only those modified files are backed up. 
This incremental backup reduces network utilization and 
traffic. The IBM ADSM product is described in "ADSM 
Version 2 Presentation Guide," (IBM Document SG24- 
4532-00, International Business Machines, copyright 1995), 
which publication is incorporated herein by reference in its 
entirety. 
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IBM has combined the ADSM product with AFS and DFS 
file servers to provide backup support for these products. An 
AFS or DFS server would include an ADSM client to 
transfer files to an ADSM server, which then backs up the 
5 files in a storage device managed by the ADSM server. One 
problem with using such backup software in a distributed 
file system is that the client managing backup operations, 
such as the ADSM client, must read a file to be backed-up. 
This reading operation consumes network resources. The 
ADSM client must then consume network resources again 
by transferring the file it has read from the file server to the 
ADSM server. Network traffic is further increased if the 
ADSM client is on a separate machine from the AFS/DFS 
server. The IBM publications entitled "ADSM AFS/DFS 
Backup Clients Version 2.1" (IBM Document SH26-404S- 

00, International Business Machines, copyright 1996) and 
"ADSM Concepts" (IBM Document SG24-4877--00, Inter- 
national Business Machines, copyright 1997) describe the 
use of the ADSM software in an AFS/DFS distributed file 
system. These publications are incorporated herein by ref- 
erence in their entirety. 

Network traffic can be significantly increased if the AFS/ 
DFS server and backup server are in one physical location, 

1. e., San Jose, Calif., and the AFS/DFS client and backup 
client requesting to backup a file in the AFS/DFS server are 
in a distant geographical location, i.e., Tucson, Ariz. If a user 
in Tucson wanted to backup a file that resided in the global 
name space managed by the AFS/DFS server in San Jose, 
prior art client/server protocol would have the AFS/DFS 
client in Tucson read the file, which requires transmittal of 
the file from San Jose to Tucson over the network, and then 
send the file back to the backup server in San Jose for backup 
storage. Such network traffic problems are exasperated when 
the client requesting the backup is separated by a long 
geographical distance from the server. 

SUMMARY OF THE INVENTION 

To address the shortcomings in the prior art described 
above, preferred embodiments of the present invention pro- 
vide a system for backing up files in a distributed computing 
system. A file server maintains files in a shared name space. 
The file server provides a first backup client program and a 
second backup client program with access to the files in the 
shared name space. The first backup client program initiates 
a backup request to backup a requested file. A determination 
is made as to whether the requested file is maintained in the 
shared name space. The backup request is transmitted to the 
second backup client program upon determining that the 
requested file is maintained in the shared name space. The 
second backup client program transmits a message to the file 
server to provide the requested file. The file server transmits 
the requested file with the file server to the second backup 
client program. The second backup client program then 
transmits the requested file to a backup server program. The 
backup server program stores the requested file in a storage 
device. 

In further embodiments, the first backup client program is 
on a first computer machine, the second backup client 
program is on a second computer machine, the backup 
server program is on a third computer machine, and the file 
server is on a fourth computer machine. The first computer, 
second computer, third computer, and fourth computer com- 
municate over a network system. 

In yet further embodiments, the first backup client pro- 
gram is on a first computer machine, the second backup 
client program and backup server program are on a second 
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computer machine, and the file server program is on a third 
computer machine. The first computer machine, second 
computer machine, and third computer machine communi- 
cate over a network system. 

It is an object of preferred embodiments of the present 
invention to provide a system for backing up files in a shared 
name space maintained in a file server which is part of a 
distributed computing environment on a storage device 
managed by a backup server program, such as a hierarchical 
storage management program. 

It is yet a further object to reduce network traffic through- 
out the distributed computing environment by having a 
proxy client including a copy of the backup client program 
read the file from the file server and transmit the file to the 
backup server to store in a storage device. In this way, a 
client at a remote location backing up a file does not read a 
file and retransmit such file back to the location from where 
the file came. 

It is still a further object that data throughput rates be 
increased between the backup client program and the backup 
server program when the backup client program transmits 
files to the backup server program. 

It is yet a further object that authentication be provided in 
using a proxy client to access files in the file server. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Referring now to the drawings in which like reference 
numbers represent corresponding parts throughout: 

FIG. 1 is a block diagram illustrating a software and 
hardware environment in which preferred embodiments of 
the present invention are implemented; 

FIG. 2 is a block diagram illustrating an alternative 
software and hardware environment in which preferred 
embodiments of the present invention are implemented; and 

FIG. 3 is a flowchart showing logic to retrieve and backup 
data in accordance with preferred embodiments of the 
present invention. 

DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENTS 

In the following description, reference is made to the 
accompanying drawings which form a part hereof, and 
which is shown, by way of illustration, several embodiments 
of the present invention. It is understood that other embodi- 
ments may be utilized and structural changes may be made 
without departing from the scope of the present invention. 

Hardware and Software Environment 

FIGS. 1 and 2 illustrate hardware and software environ- 
ments in which preferred embodiments of the present inven- 
tion are implemented. FIG. 1 illustrates a distributed com- 
puting system 2 comprised of four separate computing 
machines: a client 4, a backup server 6, a file server 8, and 
a proxy client 10. These four computing machines may be 
comprised of a personal computer, workstation, mainframe, 
etc. The computers 4, 6, 8, 10 would include an operating 
system such as AIX, OS/2, Unix, Microsoft Windows, etc. 
These four machines 4, 6, 8, 10 include software to allow the 
machines to function as components in a distributed com- 
puting system 2, such as the IBM or Transarc Distributed 
Computing Environment (DCE) products. These computer 
machines 4, 6, 8, 10 may communicate via any suitable 
network technology known in the art, such as LAN, WAN, 
SNA networks, TCP/IP, the Internet, etc. 

In the embodiment of FIG. 1, the backup server 6, storage 
devices 22, proxy client 10, and file server 8 are located in 



16,414 

4 

a Location A and the client 4 is in a Location B. Location A 
and B may be in distant geographical locations. In alterna- 
tive embodiments, the file server 8, backup server 6, proxy 
client 10, and client 4 can be in a single location, dispersed 

5 throughout a single site, dispersed throughout different sites 
in the same geographical proximity, dispersed throughout 
different sites at distant geographical locations, etc. If the 
proxy client 10 and backup server 6 are on separate 
machines, then a high-speed connection line 12, e.g., a 

10 HIPPI or a high speed switch, such as the high speed switch 
built into the SP2 architecture, could connect the proxy 
client 10 and backup server 6. 

The client 4 and proxy client 10 include a distributed file 
system (DFS) client program 14 that provides communica- 

15 tion with the file server 8 and access to files in a shared name 
space. The file server 8 includes a DFS server program 16 
that manages the shared name space and makes data in the 
shared name space available to machines within the distrib- 
uted computing system 2 running the DFS client program 

20 14. The DFS server program 16 further runs various dis- 
tributed file system management processes. The DFS server 
program 16 and client program 14 may be part of a distrib- 
uted file system (DFS) such as the AFS and DFS systems 
available from Transarc, the IBM Distributed Computing 

25 Environment (DCE), the Network File Server ("NFS") prod- 
ucts from Sun Microsystems, Inc, or any other suitable 
distributed file system software known in the art. The terms 
"DFS client program" and "DFS server program/' as used 
herein, refer generally to a DFS system and not to the 

30 particular DFS system provided by Transarc, IBM, or any 
other software provider. The DFS client program 14 and 
server program 16 include a communication protocol that 
allows the client 4 and proxy client 10, including the DFS 
client program 14, to interface with the file server 8 via the 

35 DFS server program 16. In preferred embodiments, the DFS 
client program 14 and server program 16 may include a 
protocol, such as the DCE Remote Procedure Call (RPC), to 
provide communication therebetween. However, those 
skilled in the art will appreciate that alternative DFS com- 

40 munication protocols could be used to provide communica- 
tion among systems within the distributed computing envi- 
ronment. 

Machines running the DFS client program 14 are capable 
of accessing files in the shared name space managed by the 

45 DFS server program 16, regardless of where those files are 
physically located. The files would conform to a uniform 
global name space, providing attached machines 4, 6, 8, 10 
with a global view of a set of files and directories indepen- 
dent of machine boundaries. The client 4, backup server 6, 

50 file server 8, and proxy client 10 may access the same shared 
name space and use the same global naming system in the 
distributed computing system 2. This allows access to the 
shared name space regardless of where the client 4, backup 
server 6, file server 8, and proxy client 10 are located. 

55 The file server 8 or some other machine could perform 
authentication services to allow clients, such as client 4 and 
proxy client 10, to access files in a file server 8. In preferred 
embodiments, the DCE/RPC authentication protocols are 
implemented in the DFS client 14 and server 16 programs. 

60 Under such authentication protocols, when a user at a client 
4 logs in, the client 4 requests a ticket to provide the client 
4 access to a set of files maintained by the file server 8. To 
access a file in the shared name space, the client 4 or proxy 
client 10 establishes communication with the DFS server 

65 program 16 in the file server 8 using the RPC protocol. Part 
of this protocol would require the client 4 or proxy client 10 
to present the authentication ticket to the DFS server pro- 
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gram 16, which would determine whether the requesting the DFS client program 14 which provides access to the file 

client 4, 10 can access the files requested in the shared name server 8 and files maintained therein. The proxy client 

space. computer 24 executes the DFS client program 14 to interface 

In preferred embodiments, the client 4 may establish with the DFS server program 16 and access files in the 

communication with the proxy client 10 via an RPC call 5 shared name space maintained in the file server 8. 

between the DFS client programs 14 in the client 4 and ^ the preferred embodiments may be implemented as 

proxy client 10. The client 4 could transfer its authenUcation a method ^ of ^ of manufacture ^ staildard 

ticket to the proxy client 10 through the RPC protocol. The £ and/or eagineering techniques to * prochlce 

proxy client 10 could, in turn, establish communication with c ~ n , , 0 \. r .. c 

f, n \ o . ' n „ ' • * • t_ j 1 j iL t^t-c- software, firmware, hardware, or any combination thereof, 

the file server 8 via an RPC call established between the DFS 1fl ™ . <( , c e \ „ / u . i( 

4 1fl . , » • i -1 a j jl i^t-o 10 The term article of manufacture (or alternatively, com- 

chent program 18 in the proxy client 10 and the DFS server . , im ,\ . . . * , , . 

• .i_ m o/^ • puter program product ) as used herein is intended to 

program 16 in the file server 8. Once communication is rr& ^ 7 . j j * ci 

, lv i_ j it_ 1- * 1A u iL tL 4 * encompass one or more computer programs and data files 

established, the proxy client 10 could use the authentication iT . c * j li j • 

< 4 j j l ill 1* 4 ji * ci • iL u j accessible from one or more computer-readable devices, 

ticket provided by the client 4 to access files m the shared . . . . , ,. 

r T v *> iai 1 r carriers, or media, such as a magnetic storage media, floppy 

name space. In this way, the proxy client s 10 level of access 1<; ... „ ™ ci * *u 

4 iL £1 o • 1' j A *l 1 1 r • j 1 disk, CD-ROM, a file server providing access to the 

to the file server 8 is limited to the level 01 access provided . , „ • • 1 • ? 1 i. ■ 

. A , 4l _ ,. „ ift .1 iL i- programs via a network transmission line, holographic unit, 

the client 4 because the proxy client 10 uses the authenti- r t e, rt - , .„ , . _ & . ' 

, 4 r iL 1 . a m a iL t. 1 etc. Of course, those skilled in the art will recognize many 

cation ticket from the client 4 to access the shared name , . £ . , 1 • . £ ... ' 

modifications may be made to this configuration without 

S ^« C ' * n , ~* departing from the scope of the present invention. 



The client 4 includes a backup client program 18 that 
allows the client 4 to communicate with the backup server 
4 to backup data to which the client 4 has access. The backup 
client program 18 may be comprised of any program that 
allows a client to communicate with a server to backup and 
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Using the Proxy Client to Backup Data 

FIG. 3 is a flowchart illustrating logic implemented in the 
programs 14, 16, 18, 20 described in FIGS. 1 and 2 to 
archive data, such as the IBM ADSM client. The backup back-up files maintained in the file server 8 in storage 
server 6 includes a backup server program 20 that stores and devices 22 managed by the backup server program 20. 
manages data in storage devices 22. The storage devices 22 Those skilled in the art will recognize that this logic is 
may be comprised of any non-volatile memory device provided for illustrative purposes only and that different 
suitable for long term storage of data, such as a tape library, logic may be used to accomplish the same results, 
optical disk library, hard disk drives, holographic units, etc. 30 Control begins at block 40 which represents the client 4 
The backup server program 20 may include a database initiating a backup operation of a file with the backup client 
program to manage and track the location of data in the program 18. The term "file" as used herein refers to an entire 
storage devices 22. The backup server program 20 further volume, logical unit, directory, subdirectory, individual file 
includes communication protocol software to communicate or any other image of data. Control transfers to block 42 
with the backup client program 18. The backup server 35 which is a decision block representing the backup client 
program 20 may be comprised of any program that allows a program 18 and/or DFS client program 14 determining 
server to manage and backup data in an attached storage whether the file to be backed up is in the shared name space, 
device 22, such as the IBM ADSM server program. In the AFS file system, including "/afs" in the file path 

In preferred embodiments the backup client program 18 typically indicates that a file is in a shared name space. In the 
may transfer a backup request to another backup client 40 DFS file system, including "/ . . . " typically indicates that 
program 18. For instance, the backup client program 18 in a file is in a shared name space. If the requested file is in the 
the client 4 may transfer a request to backup a file in the file shared name space then control transfers to block 44; 
server 8 to the backup client 18 in the proxy client 10. otherwise, control transfers to block 46. If the file is not in 

FIG. 2 illustrates an alternative embodiment in which the the shared name space, then at block 46, the backup client 
proxy client 24 is a single computer machine including the 45 program 18 in the client 4 accesses the file and reads the file, 
backup server program 20, the DFS client program 14, and Control transfers to block 48 which represents the backup 
the backup client program 18. The proxy client computer 24 client program 18 in the client 4 transmitting the accessed 
may be a personal computer, workstation, mainframe, etc. file to the backup server 6. At block 50, the backup server, 
The proxy client computer 24 and file server 8 may be in the operating under control of the backup server program 20, 
same geographical location and/or site, and the client 4 may 50 backs up the file in the storage devices 22. 
be at another site and/or distant geographical location. In the If the file is in the shared name space, then, at block 44, 
embodiment of FIG. 2 where the backup client program 18 the client 4, operating under control of the DFS client 
and backup server program 20 are located on the same program 14, makes a call to the proxy client 10 to perform 
computer machine (node) 24, the backup client 18 and the backup operation. The client 4 may use the RPC protocol 
server 20 programs may communicate using the memory of 55 to interface with the proxy client 10. In side the proxy client 
the proxy client computer 24. For instance, the IBM ADSM 10, the backup request is passed to the backup client 
product provides a shared memory protocol for transferring program 18. Control transfers to block 52 which represents 
data between an ADSM client and server located on the the client 4, using the DCE/RPC protocol, passing an 
same machine using a common memory area on the com- authentication ticket presented to the client 4. The client 4 
puter machine. The backup client program 18 would access 60 may include the authentication ticket with the request to the 
and read data and transmit such data through the shared proxy client 10 to backup the file. This authentication ticket 
memory space to the backup server program 20. The backup determines the level of access the client 4 has to the file 
server program 20 would read the data copied to the shared server 8. Control transfers to block 54 which represents the 
memory space and then manage the storage of the transmit- proxy client 10 making a call to the file server 8 to access the 
ted data to the storage devices 22. As with the embodiment 65 file to be backed up. In the preferred embodiments, the proxy 
in FIG. 1, the backup client program 18 and backup server client 10, under control of the DFS client 14, makes a RPC 
program 20 share the same global file name space through call to the DFS server program 16 in the file server 8 to 
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access the shared name space. Control transfers to block 56 which may be in a distant geographical location, e.g., 

which represents the proxy client 10, through the RPC call Location B, from where the backup server 6 and file server 

established with the DFS client 14 and server 6 programs, 8 are located, e.g., Location A, does not have to read the file 

providing the authentication ticket passed from the client 4 from the file server 8 and then retransmit the file back to the 

to the file server 8. The proxy client 10 may include the 5 backup server 6. Greater reductions in network traffic are 

authentication ticket with the call to the file server 8 to furthea: realized if the backup server 6, proxy client 10 and 

access the file. gj e servcr 8 are in a proximate location. Network traffic may 

Control proceeds to block 58 which is a decision block be forthcr redu ced in those embodiments which utilize a 

representing the DFS server program 16 in the file server 8 separatc high specd corme ction 12 (FIG. 1) between the 

determining whether the authentication ticket permits the 1Q cUcnt 10 afld the backu SMVCr 6 tQ dkw mc backu 

proxy client 10 to access the file to be backed up If so, ^ lg jn ^ dient 10 tQ trangfcr ^ fik 

control transfers to block 60; otherwise, control transfers to tQ ^ backu server 20 ^ datfl fates afld 

block 62. f ttie authentication ticket does not permit access b the main communication lines of the network . 
to the file to be backed up, then control proceeds to block 62 

which represents the file server 8 sending a message to the J5 Conclusion 
proxy client 10 that access to the file is not permitted. At 

block 64, the proxy client 10 notifies the client 4 that access This concludes the description of the preferred embodi- 

to the file to be backed up was denied. As discussed, in ments of the invention. The following describes some alter- 

preferred embodiments, the client 4 and proxy client 10 can native embodiments for accomplishing the present inven- 

communicate using the RPC interface, 2Q tion. 

If the authentication ticket from the client 4 permits Preferred embodiments utilize current available products, 

access to the file, then at block 60, the file server 8 provides such ^ j^sM, DFS, AFS, and NFS. However, any suitable 

the requested file to the proxy client 10 via the call estab- program capa ble of performing the functions described 

lished between the DFS client program 14 in the proxy client herein could be subst i tu ted for the preferred embodiments 

10 and the DFS server program 16 in the file server 8. ^ described herein 
Control proceeds to block 68 which represents the backup 

client program 18 in the proxy client 10 transmitting the file , In preferred embodiments, certain operations are 

provided from the file server 8 to the backup server program described as being performed by certain computer programs 

20 in the backup server 6. As discussed, the backup client 14 > 16 > 18 > 20 * However, those skilled in the art will 

program 18 in the proxy client 10 may communicate with 30 W«cjate that an alternative combination of programs could 

the backup server program 20 via the high speed commu- * m * d ^ implement the logic of preferred embodiments of 

nication line 12. Control transfers to block 70 which repre- * e "wntion. Moreover the programs 14, 16, 18, 20 may 

sents the baclmp server program 20 transferring the file to themselves be comprised of one or more component com- 

the storage devices 22. In preferred embodiments, the P^er Programs, e.g. executable and data ^files, that funcUon 

backup server program 20 transfers data from the proxy 35 t0 S ether 10 'f? 0 ™ ^rations described with respect to 

client 10 the storage devices 22 simultaneously as the data programs 14, 16, 18, 20. 

is transmitted from the proxy client 10. Control transfers to In summary, preferred embodiments disclose a system for 

block 72 which represents the proxy client 10 providing backing up files in a distributed computing system. A file 

status information to the client 4 upon completion of the file server maintains files in a shared name space. The file server 

backup. Control then proceeds to block 74 which represents ^ provides a first backup client program and a second backup 

the client 4 providing the status information to the user at client program with access to the files in the shared name 

client 4. space. The first backup client program initiates a backup 

If authentication was unnecessary, then the logic of FIG. request to backup a requested file. A determination is made 

3 would not include the steps to authenticate the level access as to whether the requested file is maintained in a shared 

for the proxy client 10. In such case, the file server 8 would 45 name space. The backup request is transmitted to the second 

provide the requested file to the proxy client 10 without backup client program upon determining that the requested 

performing authentication verification to determine if access file is maintained in the shared name space. The second 

is permitted to the requested file. backup client program transmits a message to the file server 

The logic described in FIG. 3 is implemented in a to provide the requested file. The file server transmits the 

distributed computing environment 2 in which the proxy 50 requested file with the file server to the second backup client 

client 10, including the backup client program 18, and the program. The second backup client program then transmits 

backup server program 20 are on a separate computer the requested file to a backup server program. The backup 

machines 6, 10, such as in the environment described in FIG. server program stores the requested file in a storage device. 

1. In the alternative embodiment of FIG. 2, the backup client The foregoing description of the preferred embodiments 

program 18 and the backup server program 20 are installed 55 of the invention has been presented for the purposes of 

on the same proxy client computer 24. Thus, the proxy client illustration and description. It is not intended to be exhaus- 

computer24 includes the backup server program 20. In such tive or to limit the invention to the precise form disclosed, 

case, at block 68, the backup client program 18 would Many modifications and variations are possible in light of 

transfer the file to be backed up to the backup server the above teaching. It is intended that the scope of the 

program 20 via a shared memory space and not the high eo invention be limited not by this detailed description, but 

speed transmission line 12 as is the case with FIG. 1. In rather by the claims appended hereto. The above 

alternative embodiments, the backup client program 18 and specification, examples and data provide a complete descrip- 

backup server program 20 may communicate via the net- tion of the manufacture and use of the composition of the 

work system providing communication among the devices invention. Since many embodiments of the invention can be 

in the distributed computing system 2. 65 made without departing from the spirit and scope of the 

With the preferred embodiments discussed above, net- invention, the invention resides in the claims hereinafter 

work traffic is significantly reduced because the client 4, appended. 
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What is claimed is: 

1. A method for backing up files in a distributed comput- 
ing system, comprising the steps of: 

maintaining, with a file server, files in a shared name 
space, wherein a first backup client program and a 5 
second backup client program are capable of accessing 
files in the shared name space via the file server; 

initiating a backup request with the first backup client 
program to backup a requested file; 

determining whether the requested file is maintained in 10 
the shared name space, 

transmitting the backup request from the first backup 
client program to the second backup client program 
upon determining that the requested file is maintained 
in the shared name space; 15 

transmitting a message with the second backup client 
program to the file server to provide the requested file; 

transmitting the requested file with the file server to the 
second backup client program; 

transmitting with the second backup client program the 20 
requested file to a backup server program; and 

storing with the backup server program the requested file 
in a storage device. 

2. The method of claim 1, wherein the first backup client ^ 
program is on a first computer machine, the second backup 
client program is on a second computer machine, the backup 
server program is on a third computer machine, and the file 
server is on a fourth computer machine, wherein the first 
computer machine, second computer machine, third com- 3Q 
puter machine, and fourth computer machine communicate 
over a network system. 

3. The method of claim 2, wherein the second backup 
client program and the backup server program communicate 
via a high speed communication line connecting the second 35 
computer machine and the third computer machine. 

4. The method of claim 2, wherein the first and second 
computer machines include a distributed file system (DFS) 
client program and wherein the file server includes a DFS 
server program, wherein the DFS client program and DFS ^ 
server program interface the first and second computer 
machines with the file server to allow access to files in the 
shared name space maintained by the file server. 

5. The method of claim 2, further including the steps of: 
issuing an authentication ticket to the first computer 45 

machine providing access to a set of files in the shared 
name space; 

transmitting with the first computer machine the authen- 
tication ticket to the second computer machine; 

transmitting with the second computer machine the 50 
authentication ticket to the file server in the fourth 
computer machine; and 

determining with the file server whether the requested file 
is in the set of files to which the authentication ticket 
permits access, wherein the step of transmitting the 55 
requested file with the file server to the second com- 
puter machine including the second backup client pro- 
gram occurs upon determining that the requested file is 
in the set of files to which the authentication ticket 
permits access. 60 

6. The method of claim 1, wherein the first backup client 
program is on a first computer machine, the second backup 
client program and backup server program are on a second 
computer machine, and the file server program is on a third 
computer machine, wherein the first computer machine, 65 
second computer machine, and third computer machine 
communicate over a network system. 
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7. The method of claim 6, wherein the second backup 
client program and the backup server program communicate 
via a shared memory within the second computer machine. 

8. The method of claim 6, wherein the first backup client 
program communicates the request to backup the file to the 
second backup client program, wherein the first and second 
computer machines include a distributed file system (DFS) 
client program, wherein the file server includes a DFS server 
program, further including the step of the DFS client pro- 
gram in the second computer machine interfacing with the 
DFS server program in the third computer machine to 
provide the second computer machine access to files in the 
shared name space maintained by the file server. 

9. A distributed computing system for backing up files in 
a shared name space, comprising: 

(a) a first backup client program, including means for 
initiating a backup request to backup a requested file; 

(b) a second backup client program; 

(c) a backup server program; 

(d) a storage device managed by the backup server 
program; 

(e) a file server, wherein the file server provides access to 
files included in a shared name space, wherein the first 
backup client program and the second backup client 
program have access to files maintained in the shared 
name space through the file server; 

(f) means for determining whether the requested file is 
included in the shared name space; 

(g) means for transmitting the backup request to the 
second backup client program upon determining that 
the requested file is included in the shared name space; 

(h) means for transmitting a message with the second 
backup client program to the file server to provide the 
requested file; 

(i) means, performed by the file server, for transmitting 
the requested file to the second backup client program; 

(j) means, performed by the second backup client 
program, for transmitting the requested file to the 
backup server program; and 

(k) means, performed by the backup server program, for 
storing the requested file in the storage device. 

10. The distributed computing system of claim 9, further 
including: 

a first computer machine including the first backup client 
program; 

a second computer machine including the second backup 

client program; 
a third computer machine including the backup server 

program; 

a fourth computer machine including the file server; and 
a network system providing communication among the 
first computer machine, second computer machine, 
third computer machine, and fourth computer machine. 

11. The distributed computing system of claim 10, further 
including a high speed communication line connecting the 
second computer machine and the third computer machine, 
wherein the second backup client program and the backup 
server program communicate via the high speed communi- 
cation line. 

12. The distributed computing system of claim 10, further 
including: 

a distributed file system (DFS) client program included 
within the first and second computer machines; and 

a DFS server program included in the file server, wherein 
the DFS client program and DFS server program inter- 
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faces the first and second computer machines and the 
file server to allow access to files in the shared name 
space maintained by the file server 

13. The distributed computing system of claim 10, further 
including: 

means for issuing an authentication ticket to the first 

computer machine providing access to a set of files in 

the shared name space; 
means, performed by the first computer machine, for 

transmitting the authentication ticket to the second 

computer machine; 
means, performed by the second computer machine, for 

transmitting the authentication ticket to the file server 

in the fourth computer machine; and 
means, performed by the file server, for determining 

whether the requested file is in the set of files to which 

the authentication ticket permits access. 

14. The distributed computing system of claim 9, further 
including: 

a first computer machine including the first backup client 
program; 

a second computer machine including the second backup 
client program and the backup server program; 

a third computer machine including the file server pro- 
gram; 

a network system providing communication between the 
first computer machine, the second computer machine, 
and the third computer machine. 

15. The distributed computing system of claim 14, further 
including a shared memory within the second computer 
machine, wherein the second backup client program and the 
backup server program communicate via the shared 
memory. 

16. The distributed computing system of claim 14, further 
including: 

a distributed file system (DFS) client program in the 
second computer machine; 

a DFS server program included in the file server, wherein 
the DFS client program interfaces the second computer 
machine with the DFS server program to access files in 
the shared name space maintained by the file server. 

17. An article of manufacture for use in programming a 
distributed computing system including a file server main- 
taining files in a shared name space, a first backup client 
program and a second backup client program, wherein the 
first and second backup client programs are capable of 
accessing files in the shared name space via the file server, 
the article of manufacture comprising at least one computer 
readable storage device including at least one computer 
program embedded therein that causes components within 
the distributed computing system to perform the steps of: 

(a) initiating a backup request with the first backup client 
program to backup a requested file; 

(b) determining whether the requested file is maintained 
in the shared name space; 

. (c) transmitting the backup request to the second backup 
client program upon determining that the requested file 
is maintained in the shared name space; 

(d) transmitting a message with the second backup client 
program to the file server to provide the requested file; 

(e) transmitting the requested file with the file server to the 
second backup client program; 

(f) transmitting with the second backup client program the 
requested file to a backup server program; and 
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(g) storing with the backup server program the requested 
file in a storage device. 

18. The article of manufacture of claim 17, wherein the 
first backup client program is on a first computer machine, 

5 the second backup client program is on a second computer 
machine, the backup server program is on a third computer 
machine, and the file server is on a fourth computer machine, 
wherein the first computer machine, second computer 
machine, third computer machine, and fourth computer 

10 machine communicate over a network system. 

19. The article of manufacture of claim 18, wherein the 
second backup client program and the backup server pro- 
gram communicate via a high speed communication line 

15 connecting the second computer machine and the third 
computer machine. 

20. The article of manufacture of claim 18, wherein the 
second computer machine includes a distributed file system 
(DFS) client program and wherein the file server includes a 

20 DFS server program, wherein the DFS client program and 
DFS server program interface the second computer machine 
with the file server to allow access to files in the shared name 
space maintained by the file server. 

21. The article of manufacture of claim 18, further includ- 
25 ing the steps of: 

issuing an authentication ticket to the first computer 
machine providing access to a set of files in the shared 
name space; 

30 transmitting with the first computer machine the authen- 
tication ticket to the second computer machine; 
transmitting with the second computer machine the 
authentication ticket to the file server in the fourth 
computer machine; 

35 determining with the file server whether the requested file 
is in the set of files to which the authentication ticket 
permits access, wherein the step of transmitting the 
requested file with the file server to the second com- 

^ puter machine including the second backup client pro- 
gram occurs upon determining that the requested file is 
in the set of files to which the authentication ticket 
permits access. 

22. The article of manufacture of claim 17, wherein the 
45 first backup client program is on a first computer machine, 

the second backup client program and backup server pro- 
gram are on a second computer machine, and the file server 
program is on a third computer machine, wherein the first 
computer machine, second computer machine, and third 
50 computer machine communicate over a network system. 

23. The article of manufacture of claim 22, wherein the 
second backup client program and the backup server pro- 
gram communicate via a shared memory within the second 
computer machine. 

55 24. The article of manufacture of claim 22, wherein the 
first computer machine communicates the request to backup 
the file from the first backup client program to the second 
backup client program in the second computer machine, 
wherein the second computer machine includes a distributed 

60 file system (DFS) client program, and wherein the file server 
includes a DFS server program, wherein the DFS client 
program interfaces the second computer machine with the 
DFS server program to provide the second computer 
machine access to files in the shared name space maintained 

65 by the file server. 

♦ * * * * 
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[57] ABSTRACT 

A method, apparatus, and article of manufacture for a 
computer implemented recovery system for restoring a 
database in a computer. The database contains objects and is 
stored on a primary data storage device connected to the 
computer. Objects of different types in the database are 
copied from the primary data storage device to a secondary 
data storage device. Modifications to the objects are logged 
in a log file. A recovery indicator is received that indicates 
that recovery of the objects in the database is required. The 
objects are copied from the secondary data storage device to 
the database on the primary data storage device. Modifica- 
tions in the log file are applied to the copied objects during 
one pass through the log file. 

24 Claims, 6 Drawing Sheets 





10/19/2002, EAST Version: 1.03.0007 



U.S. Patent 



Sep. 12, 2000 



Sheet 1 of 6 



6,119,128 



Computer 102 



^1 



108 



Terminal 
Interface 



110 



Locking 
Services 



116 



112 



^1 



118 



System 
Services 



114 



Relational 
Database System 


Data 
Manager 


Other 
Components 


Recovery 
System 


Buffer Manager 


/ i 

122 


i \ 

120 




104 



124 



Monitor 



106 



N 



Log 



FIG. 1 



10/19/2002, EAST Version: 1.03.0007 



U.S. Patent 



Sep. 12, 2000 



Sheet 2 of 6 



6,119,128 




10/19/2002, EAST Version: 1.03.0007 



U.S. Patent 



Sep. 12, 2000 



Sheet 3 of 6 



6,119,128 




10/19/2002, EAST Version: 1.03.0007 



U.S. Patent 



Sep. 12, 2000 



Sheet 4 of 6 



6,119,128 



CO 



\ 




3 


L2 




3 


ex 




CO 






a 













21 


CO 

-J 




Ij 


S3 






CN 




0) 


<D 




O) 


O) 




c 


C 




CO 


CO 




QC 


a: 






o 
o 



\ 



c 

.2 

t= 

CO 
Q. 





CD 



10/19/2002, EAST Version: 1.03.0007 



U.S. Patent 



Sep. 12, 2000 



Sheet 5 of 6 



6,119,128 



Copy Objects 
from Primary Data 

Storage Device 
to Secondary Data 

Storage Device 



Log All Operations 
for each Object in 
the Database that 
is Modified 



500 



502 



Receive Recovery / 
Indicator 



504 



FIG. 5 



10/19/2002, EAST Version: 1.03.0007 



U.S. Patent 



Sep. 12, 2000 



Sheet 6 of 6 



6,119,128 



Copy Objects 
from Secondary Data 
Storage Device 
to Primary Data 
Storage Device 



600 



Determine Starting 
Point in Log File 



602 



Apply Log Operations 
to All Objects 
With One Pass 
of the Log File 



604 



FIG. 6 



10/19/2002, EAST Version: 1.03.0007 



6,119,128 

1 2 * 

RECOVERING DIFFERENT TYPES OF device, for recovery purposes. In particular, the partitions 

OBJECTS WITH ONE PASS OF THE LOG stored on the primary data storage device may be corrupted, 

for example, due to a system failure during a flood, or a user 

BACKGROUND OF THE INVENTION rnay want to remove modifications to the data (i.e., back out 

5 the changes). In either case, for recovery, the partitions are 

1. Field of the Invention typically copied from the secondary data storage device to 
This invention relates in general to computer- the primary data storage device. Next, using the log file, the 

implemented database systems, and, in particular, to recov- copied data is modified based on the operations in the log 
ering different types of objects with one pass of the log. file. Then, the indexes are rebuilt. In particular, to rebuild the 

2. Description of Related Art 10 index ? s ' ke / s L are **** frora each row of each partit j on > 

. i . . t sorted, and then used to create a partitioning index. 

Databases are computerized information storage and AdditioDallv , the MU index & rebuill via the same tech . 

retrieval systems. A Relational Database Management Sys- n i aue 

tem (RDBMS) is a database management system (DBMS) m. - . i_ ■ r r j * j j 

. . i ' . A . . r 4 • j \ • • This technique for recovery of data and indexes is very 

which uses relational techniques for storing and retrieving *i • * r * AJ , V n \ 

i j . i_ u- ? k costly in terms of performance. Additionally, users are not 

data. Relational databases are organized into tables which 15 , , * , . r . . , . y i- 

. . - j * j rr« £ 11 able to access data while recovery is taking place, hot a user 

consist of rows and columns of data. The rows are formally / * * j u • 

« , . , * , . . (11 . . « , , . , j or company requiring the use of computers to do business, 

called tuples. A database will typically have many tables and l li.j* r •« • 

u f ,f M1 t . , i.- i . i a u- i much money can be lost during recovery. Therefore, it is 

each table will typically have multiple tuples and multiple . 4 , 4 ■ *i_ je • r *i_ 

i rn. * ui ♦ • 11 ♦ j a- important to improve the efficiency of the recovery process, 

columns. The tables are typically stored on direct access , r , . ., . J . ; r 

a /t^aof^\ t_ . i j • i on and there is a need in the art for an improved recovery 

storage devices (DASD), such as magnetic or optical disk 20 , . * ' 

drives for semipermanent storage. " 

A table is assigned to a tablespace. The tablespace con- SUMMARY OF THE INVENTION 
tains one or more datasets. In this way, the data from a table To overcome the limitations in the prior art described 
is assigned to physical storage on DASD. Each tablespace is above, and to overcome other limitations that will become 
physically divided into equal units called pages. The size of 25 apparent upon reading and understanding the present 
the tablespace' s pages is based on the page size of the specification, the present invention discloses a method, 
bufferpool specified in the tablespace's creation statement. apparatus, and article of manufacture for a computer imple- 
The bufferpool is an area of virtual storage that is used to mented recovery system for restoring a database in a corn- 
store data temporarily. A tablespace can be partitioned, in puter. 

which case a table may be divided among the tablespace's 30 j n accordance with the present invention, the database 
partitions, with each partition stored as a separate dataset. contains objects and is stored on a primary data storage 
Partitions are typically used for very large tables. device connected to the computer. Objects of different types 
A table may have an index. An index is an ordered set of in the database are copied from the primary data storage 
pointers to the data in the table. There is one physical order 35 device to a secondary data storage device. Modifications to 
to the rows in a table that is determined by the RDBMS the objects are logged in a log file. A recovery indicator is 
software, and not by a user. Therefore, it may be difficult to received that indicates that recovery of the objects in the 
locate a particular row in a table by scanning the table. A database is required. The objects are copied from the sec- 
user creates an index on a table, and the index is based on ondary data storage device to the database on the primary 
one or more columns of the table. A partitioned table must ^ data storage device. Modifications in the log file are applied 
have at least one index. The index is called the partitioning to the copied objects during one pass through the log file, 
index and is used to define the scope of each partition and An object of the invention is to provide an improved 
thereby assign rows of the table to their respective partitions. recovery system for a database. Another object of the 
The partitioning indexes are created in addition to, rather invention is to provide recovery for partitions, partitioning 
than in place of, a table index. An index may be created as 45 indexes, and table indexes simultaneously. Yet another 
UNIQUE so that two rows can not be inserted into a table object of the invention is to provide a recovery system for a 
if doing so would result in two of the same index values. database that requires only one pass of a log file to apply 
Also, an index may be created as a CLUSTERING index, in modifications to the database, 
which case the index physically stores the rows in order DESCRIPTION OF THE DRAWINGS 
according to the values in the columns specified as the 

clustering index (i.e., ascending or descending, as specified Referring now to the drawings in which like reference 

by the user). numbers represent corresponding parts throughout: 

RDBMS software using a Structured Query Language FIG - 1 illustrates an exemplary computer hardware envi- 

(SQL) interface is well known in the art. The SQL interface ronment that could be used in accordance with the present 

has evolved into a standard language for RDBMS software 55 inventl on; 

and has been adopted as such by both the American National FIG. 2 illustrates a conventional system for recovery of a 

Standards Institute (ANSI) and the International Standards database; 

Organization (ISO). The SQL interface allows users to FIG. 3 illustrates the recovery system in accordance with 

formulate relational operations on the tables either the present invention; 

interactively, in batch files, or embedded in host languages, 60 FIG. 4 provides an example that illustrates the recovery 

such as C and COBOL. SQL allows the user to manipulate system in accordance with the present invention; 

the data. As the data is being modified, all operations on the FIG. 5 is a flow diagram illustrating the steps performed 

data are logged in a log file. by the recovery system prior to recovery of a database in 

Typically, the database containing partitions and indexes accordance with the present invention; and 

is stored on a data storage device, called a primary data 65 FIG. 6 is a flow diagram illustrating the steps performed 

storage device. The partitions are periodically copied to by the recovery system to recover a database in accordance 

another data storage device, called a secondary data storage with the present invention. 
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DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

In the following description of the preferred embodiment, 
reference is made to the accompanying drawings which 
form a part hereof, and which is shown by way of illustration 
a specific embodiment in which the invention may be 
practiced. It is to be understood that other embodiments may 
be utilized as structural changes may be made without 
departing from the scope of the present invention. 

Hardware Environment 

FIG. 1 illustrates an exemplary computer hardware envi- 
ronment that could be used in accordance with the present 
invention. In the exemplary environment, a computer system 
102 is comprised of one or more processors connected to one 
or more data storage devices 104 and 106 that store one or 
more relational databases, such as a fixed or hard disk drive, 
a floppy disk drive, a CDROM drive, a tape drive, or other 
device. 

Operators of the computer system 102 use a standard 
operator interface 108, such as IMS/DB/DC®, CICS®, 
TSO®, OS/390® or other similar interface, to transmit 
electrical signals to and from the computer system 102 that 
represent commands for performing various search and 
retrieval functions, termed queries, against the databases. In 
the present invention, these queries conform to the Struc- 
tured Query Language (SQL) standard, and invoke functions 
performed by Relational DataBase Management System 
(RDBMS) software. In the preferred embodiment of the 
present invention, the RDBMS software comprises the 
DB2® product offered by IBM for the MVS® or OS/390® 
operating systems. Those skilled in the art will recognize, 
however, that the present invention has application program 
to any RDBMS software that uses SQL. 

As illustrated in FIG. 1, the DB2® architecture for the 
MVS® operating system includes three major components: 
the Internal Resource Lock Manager (IRLM) 110, the Sys- 
tems Services module 112, and the Database Services mod- 
ule 1 14. The IRLM 110 handles locking services for the 
DB2® architecture, which treats data as a shared resource, 
thereby allowing any number of users to access the same 
data simultaneously. Thus concurrency control is required to 
isolate users and to maintain data integrity. The Systems 
Services module 112 controls the overall DB2® execution 
environment, including managing log data sets 106, gather- 
ing statistics, handling startup and shutdown, and providing 
management support. 

At the center of the DB2® architecture is the Database 
Services module 114. The Database Services module 114 
contains several submodules, including the Relational Data- 
base System (RDS) 116, the Data Manager 118, the Buffer 
Manager 120, the Recovery System 122, and other compo- 
nents 124, such as an SQL compiler/interpreter. These 
submodules support the functions of the SQL language, i.e. 
definition, access control, interpretation, compilation, data- 
base retrieval, and update of user and system data. The 
Recovery System 122 works with the components of the 
computer system 102 to restore a database. 

The present invention is generally implemented using 
SQL statements executed under the control of the Database 
Services module 114. The Database Services module 114 
retrieves or receives the SQL statements, wherein the SQL 
statements are generally stored in a text file on the data 
storage devices 104 and 106 or are interactively entered into 
the computer system 102 by an operator sitting at a monitor 
124 via operator interface 108. The Database Services 
module 114 then derives or synthesizes instructions from the 
SQL statements for execution by the computer system 102. 
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Generally, the RDBMS software, the SQL statements, and 
the instructions derived therefrom, are all tangibly embodied 
in a computer- re ad able medium, e.g. one or more of the data 
storage devices 104 and 106. Moreover, the RDBMS 

5 software, the SQL statements, and the instructions derived 
therefrom, are all comprised of instructions which, when 
read and executed by the computer system 102, causes the 
computer system 102 to perform the steps necessary to 
implement and/or use the present invention. Under control 

10 of an operating system, the RDBMS software, the SQL 
statements, and the instructions derived therefrom, may be 
loaded from the data storage devices 104 and 106 into a 
memory of the computer system 102 for use during actual 
operations. 

15 Thus, the present invention may be implemented as a 
method, apparatus, or article of manufacture using standard 
programming and/or engineering techniques to produce 
software, firmware, hardware, or any combination thereof. 
The term "article of manufacture" (or alternatively, "com- 

20 puter program product") as used herein is intended to 
encompass a computer program accessible from any 
computer-readable device, carrier, or media. Of course, 
those skilled in the art will recognize many modifications 
may be made to this configuration without departing from 

25 the scope of the present invention. 

Those skilled in the art will recognize that the exemplary 
environment illustrated in FIG, 1 is not intended to limit the 
present invention. Indeed, those skilled in the art will 
recognize that other alternative hardware environments may 

30 be used without departing from the scope of the present 
invention. 

Recovering Different Types of Objects With One Pass of 
The Log 

35 The present invention provides a recovery system 122 for 
recovering different types of objects using only one pass 
through a log file. In particular, table partitions of a database, 
along with indexes (e.g., partitioning indexes and table 
indexes), are copied to one or more data storage devices, 

40 such as magnetic tape. The database may be stored on a 
primary data storage device, while the copies of the database 
partitions and indexes are stored on a secondary data storage 
device. The primary and secondary data storage devices 
could be the same or different devices. 

45 Then, as modifications are made to the data in the table 
partitions, the modifications are logged in a log file. If 
recovery of the table partitions and partitioning indexes are 
required, the recovery system 122 of the present invention 
copies the table partitions and partitioning indexes from the 

50 secondary data storage device back to the database. Then, 
the recovery system 122 modifies both the table partitions 
and the partitioning indexes while making one pass through 
the log file. That is, the recovery system 122 extracts all of 
the pertinent log records containing updates to all of the 

ss objects being recovered in a single read pass of logged 
changes. 

The recovery system 122 allows for independent recovery 
of the data and indexes, and a significant decrease in elapsed 
time since the log file updates are done for all objects in the 

60 database with one pass through the log file. 

FIG. 2 illustrates a conventional system for recovery of a 
database. In a conventional system, partitions 200 and 202 
of a database are copied from primary data storage devices 
to secondary data storage devices 204 and 206. In a con- 

65 ventional system, the partitioning indexes 208 and 210 and 
the table index 212 are not stored on secondary data storage 
devices. Then, when recovery is required, the conventional 
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system copies the partitions 200 and 202 from the secondary 
data storage devices 204 and 206 to the database on the 
primary data storage devices. The conventional system 
applies modifications logged in a log file to the copied 
partitions. Then, the conventional system reads each row of 5 
each partition 200 and 202 and retrieves index keys 214 and 
216 for each row of each partition 200 and 202. TTie index 
keys 214 and 216 are sorted and are used to rebuild indexes 
208 and 210, respectively. Table index 212 is rebuilt in the 
same manner. This procedure has a high performance cost. 10 

FIG. 3 illustrates the recovery system 122 in accordance 
with the present invention. Initially, the partitions 300 and 
302 are copied to secondary data storage devices 304 and 
306. Also, partitioning indexes 308 and 312 are copied to 
secondary data storage devices 310 and 314. The table index 15 
316 is also copied to a secondary data storage device 318. 
Then, as application programs 320 modify the database by 
adding, updating, or deleting data via operations, the modi- 
fications are logged in the log file 322. The log file may be 
copied to a secondary data storage device 324 if the log file 20 
on the primary storage device becomes full. The log file 322 
contains information identifying modifications to both the 
partitions and indexes. 

For recovery, the partitions from the data storage devices 
304 and 306 are copied back to the primary data storage 25 
device. The partitioning indexes are copied from the sec- 
ondary data storage devices 310 and 314 to the primary data 
storage device. Additionally, the table index is copied from 
the secondary data storage device 318 to the primary data 
storage device. Then, the log records subsequent to the last 30 
copying from the primary to the secondary data storage 
devices are applied to the partitions and indexes. In 
particular, while reading the log file through once, the 
recovery system 122 modifies both the partitions 300 and 
302 and the indexes 308, 312, and 316. 35 

FIG. 4 provides an example that illustrates the recovery 
system 122 in accordance with the present invention. 
Initially, the partition 400 is copied to a secondary data 
storage device 402, and the partitioning index 404 is copied ^ 
to a secondary data storage device 406. Then, a user appli- 
cation program and/or a data management system 408 
perform operations on the data in the partition 400 and 
partitioning index 404. The operations are logged in the log 
file 410. 

45 

For example, if the first operation adds a new employee, 
the recovery system 122 modifies the partition 400 and the 
partitioning index 404. In particular, the recovery system 
122 adds entries to the log file for "New Emp Data" and 
"New Emp Index". Then, if an operation updates salary 50 
information for an employee, the recovery system 122 
modifies the partition 400. The log file then contains an entry 
for "New Salary, Old Salary" that identifies the salary before 
and after modification. Next, if the partition 400 and parti- 
tioning index 404 are to be copied, a log range file 414 and ss 
a copies file 416 are modified. In particular, the log file 410 
is separated into ranges. The log range file 414 indicates 
each of the ranges, for example, that R angel goes from 
range identifier LI to range identifier 12. The copies file 416 
indicates partitions and indexes that have been copied. The 60 
range identifier, for example, L2, indicates that the copied 
data for the partitions and indexes includes all of the 
operations logged up to range identifier L2. 

Next, if the name of an employee is changed, the partition 
400 and partitioning index 404 are modified. Then, the log 65 
file 410 contains an entry for "New Name, Old Name" that 
provides the new and old name of the employee whose name 
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changed and an entry for "New Name Index, Old Name 
Index" that provides the index modification. Next, assuming 
that there is a loss of data, recovery of the data is required. 
Initially, the partition 400 and the partitioning index 404 are 
copied from secondary data storage devices back to the 
primary data storage device. Since, according to the copies 
file 416, these copies include all modifications up to range 
identifier L2, only operations after range identifier L2 arc 
applied to the partition 400 and the partitioning index 404 to 
recover the database. Moreover, during one pass through the 
log file, the recovery system 122 identifies the required 
modifications and applies them to the partition 400 and the 
partitioning index 404. 

FIG. 5 is a flow diagram illustrating the steps performed 
by the recovery system 122 prior to recovery of a database 
in accordance with the present invention. In Block 500, the 
recovery system 122 copies objects from a primary data 
storage device to a secondary data storage device. In Block 
502, the recovery system 122 logs all operations for each 
object in the database that is modified. In Block 504, the 
recovery system 122 receives a recovery indicator. 

FIG. 6 is a flow diagram illustrating the steps performed 
by the recovery system 122 to recover a database in accor- 
dance with the present invention. In Block 600, the recovery 
system 122 copies objects from the secondary data storage 
device to the primary data storage device. That is, each of the 
objects is replaced by an image copy taken at a previous 
time. The individual objects may be restored from the image 
copies concurrently with each other. In Block 602, the 
recovery system 122 determines the point in the log at which 
to start applying operations. In Block 604, the recovery 
system 122 applies log operations to all objects through one 
pass of the log file. 

That is, beginning at the determined starting point, the 
recovery system 122 reads the log file and extracts the 
changes for each individual object. The recovery system 122 
applies the changes to the specified elements within the 
object. The changes to an individual object may be applied 
concurrently with changes being applied to the other objects 
involved in the recovery. This continues until all the required 
changes have been applied to all the specified objects. 

Conclusion 

This concludes the description of the preferred embodi- 
ment of the invention. The following describes some alter- 
native embodiments for accomplishing the present inven- 
tion. For example, any type of computer, such as a 
mainframe, minicomputer, or personal computer, or com- 
puter configuration, such as a timesharing mainframe, local 
area network, or standalone personal computer, could be 
used with the present invention. 

In summary, the present invention discloses a method, 
apparatus, and article of manufacture for a computer- 
implemented recovery system. The present invention pro- 
vides an improved recovery system for a database. 
Additionally, the present invention provides recovery for 
partitions and partitioning indexes simultaneously. 
Moreover, the present invention provides a recovery system 
for a database that requires only one pass of a log file to 
apply modifications to the database. 

The foregoing description of the preferred embodiment of 
the invention has been presented for the purposes of illus- 
tration and description. It is not intended to be exhaustive or 
to limit the invention to the precise form disclosed. Many 
modifications and variations are possible in light of the 
above teaching. It is intended that the scope of the invention 
be limited not by this detailed description, but rather by the 
claims appended hereto. 
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What is claimed is: 

1. A method of restoring a database in a computer, the 
database containing objects and being stored on a primary 
data storage device connected to the computer, the method 
comprising the steps of: 5 

copying objects of different types in the database from the 
primary data storage device to a secondary data storage 
device, wherein one of the objects is a table index for 
locating data in a table, and wherein one of the objects 
is a partitioning index for defining a scope of each 10 
partition and thereby assigning a row of the table to its 
respective partition; 

logging modifications to the objects, including the table 
index and the partitioning index, in a log file; 

receiving a recovery indicator indicating that recovery of 
the objects in the database is required; 

copying the objects, including the table index and the 
partitioning index, from the secondary data storage 
device to the database on the primary data storage 2 o 
device; and 

applying the modifications in the log file to the copied 
objects, including the table index and the partitioning 
index, during one pass through the log file. 

2. The method of claim 1, wherein the types of the objects 25 
include table data. 

3. The method of claim 1, wherein the types of the objects 
include partition indexes. 

4. The method of claim 1, wherein the recovery indicator 
indicates that modifications to the objects are to be reversed, 30 

5. The method of claim 1, wherein the recovery indicator 
indicates that the objects have been corrupted. 

6. The method of claim 1, further comprising the step of 
maintaining log ranges. 

7. The method of claim 1, further comprising the step of 35 
maintaining copy information for objects that are copied. 

8. The method of claim 7, further comprising the step of 
determining a starting point in the log file based on the copy 
information. 

9. An apparatus for restoring a database in a computer, 40 
comprising: 

a computer having a primary data storage device con- 
nected thereto, wherein the primary data storage device 
stores a database containing objects; 

one or more computer programs, performed by the 45 
computer, for copying objects of different types in the 
database from the primary data storage device to a 
secondary data storage device, wherein one of the 
objects is a table index for locating data in a table, and 
wherein one of the objects is a partitioning index for 50 
defining a scope of each partition and thereby assigning 
a row of the table to its respective partition, logging 
modifications to the objects, including the table index 
and the partitioning index, in a log file, receiving a 
recovery indicator indicating that recovery of the 55 
objects in the database is required, copying the objects, 
including the table index and the partitioning index, 
from the secondary data storage device to the database 
on the primary data storage device, and applying the 
modifications in the log file to the copied objects, 60 
including the table index and the partitioning index, 
during one pass through the log file. 



10. The apparatus of claim 9, wherein the types of the 
objects include table data. 

11. The apparatus of claim 9, wherein the types of the 
objects include partition indexes. 

12. The apparatus of claim 9, wherein the recovery 
indicator indicates that modifications to the objects are to be 
reversed. 

13. The apparatus of claim 9, wherein the recovery 
indicator indicates that the objects have been corrupted. 

14. The apparatus of claim 9, further comprising the 
means for maintaining log ranges. 

15. The apparatus of claim 9, further comprising the 
means for maintaining copy information for objects that are 
copied. 

16. The apparatus of claim 15, further comprising the 
means for determining a starting point in the log file based 
on the copy information. 

17. An article of manufacture comprising a computer 
program carrier readable by a computer and embodying one 
or more instructions executable by the computer to perform 
method steps for restoring a database, the database contain- 
ing objects and being stored on a primary data storage device 
connected to the computer, the method comprising the steps 
of: 

copying objects of different types in the database from the 
primary data storage device to a secondary data storage 
device, wherein one of the objects is a table index for 
locating data in a table, and wherein one of the objects 
is a partitioning index for defining a scope of each 
partition and thereby assigning a row of the table to its 
respective partition; 

logging modifications to the objects, including the table 
index and the partitioning index, in a log file; 

receiving a recovery indicator indicating that recovery of 
the objects in the database is required; 

copying the objects, including the table index and the 
partitioning index, from the secondary data storage 
device to the database on the primary data storage 
device; and 

applying the modifications in the log file to the copied 
objects, including the table index and the partitioning 
index, during one pass through the log file. 

18. The method of claim 17, wherein the types of the 
objects include table data. 

19. The method of claim 17, wherein the types of the 
objects include partition indexes. 

20. The method of claim 17, wherein the recovery indi- 
cator indicates that modifications to the objects are to be 
reversed. 

21. The method of claim 17, wherein the recovery indi- 
cator indicates that the objects have been corrupted. 

22. The method of claim 17, further comprising the step 
of maintaining log ranges. 

23. The method of claim 17, further comprising the step 
of mamtaining copy information for objects that are copied. 

24. The method of claim 23, further comprising the step 
of determining a starting point in the log file based on the 
copy information. 
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1 

DATA PROCESSING SYSTEM FOR 
COMMUNICATIONS NETWORK 

Matter enclosed in heavy brackets [ ] appears in the 
original patent but forms no part of this reissue specifi- 5 
cation; matter printed in italics indicates the additions 
made by reissue. 

RELATED APPLICATION 

This application is related to my copending commonly 10 
assigned application Ser. No. 08/392,975 filed Mar. 6, 1995. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 15 
The present invention relates to a data system for data 

collection and processing in multi network communications. 

2. Related Art 

Where communication instances, for instance telephone 
calls or data transfers, occur within a single network, it is 20 
known to log and process data related to those communi- 
cation instances. Commonly, in a public switched telephone 
network (PSTN), data will be collected concerning call 
duration, and processed with respect to at least time of day 
and call type, so that the network operator can generate an 25 
item on a bill destined for the subscriber who initiated a call. 

Over recent years, the data systems for PSTNs have 
necessarily become increasingly complex as the choice of 
service and call type available to subscribers has greatly 
increased. For instances, with the introduction of 0800 30 
numbers, it is no longer the initiating subscriber who will be 
billed. Many more complicated services are already being 
trialled, or available, on PSTNs, such as call forwarding 
where a call initiated by a first subscriber to a selected 
number is forwarded automatically by the network to a 35 
different number, the different in cost being bome by the 
receiving subscriber. 

Another aspect of communication networks which is in 
the course of considerable change is the multiplicity of 
network operators in existence. In the past, PSTNs have 
been run primarily by government organizations as part of 
the national infra structure. Nowadays and increasingly, 
privatization of the PSTNs and the relaxation of regulatory 
monopolies means that there are many more network opera- 45 
tors available to the subscriber and these network operators 
must, for practical reasons, provide inter network connec- 
tion. This means that a network operator must take into 
account not only communication instances arising in their 
own network or in a limited number of inter-connected 5Q 
networks of independent but similar administrations, but 
also communication instances arising in a theoretically very 
large number of competing networks of different types and 
providing a wide variety of services to subscribers. 

It is, therefore, .of increasing importance that data be ss 
collected and processed in connection with communication 
instances arising outside an operator's network but termi- 
nating in or simply crossing the operator'network. 

When calls pass through the network of more than one 
operator, price and charging agreements between operators go 
for the carriage of each other's calls come into play. Such 
arrangements can vary from the simple Sender Keeps All 
(SKA) arrangement to complex pricing formulae. 

It has been an established practice between separate 
network operators or administrations, in 65 
telecommunications, that call data would be collected by the 
administration responsible for the network in which a call 



',857 E 

2 

arises. If that call then terminates in a second network, the 
administration concerned with the second network relies on 
the data collected by the administration responsible for the 
first network, for instance for accounting purposes, 
However, the telecommunications environment is changing 
quickly, politically as well as technically. With the advent of 
greater competition, it is increasingly attractive to network 
administrations to monitor not only traffic arising in their 
own network but also traffic arising elsewhere but crossing 
or terminating in their own network. If the network in which 
traffic arises belongs to a competing operator or 
administration, it is desirable that it is at least possible to 
cross check the competing operator's accounts. 

In known arrangements, data collection points concerning 
calls in a PSTN have been at local exchanges of a network 
since the local exchange picks up traffic as it arises. This 
arrangement, however, does not provide for data collection 
with respect to inter-network traffic. Even were there to be 
data collection points to collect data on calls incoming to a 
network, the logistics involved in processing such data to 
any level of detail are daunting. For instance, it has been 
estimated that calls incoming to the PSTN operated in 
Britain by British Telecommunications pic (BT) from other 
network administrations including the Isle of Man and the 
Cellnet cellular network totalled 15.4 million calls per day 
in the twelve months prior to March 1992. This figure is 
expected to increase to the order of 27 million calls a day in 
the year prior to March 1 995 . Taking all call instances into 
account, including those arising within the BT PSTN, 60 
million call instances per day have been predicted for 1995. 

SUMMARY OF THE INVENTION 

In spike of the very large quantity of data involved, it has 
been found possible in making the present invention to 
design a process for collecting and processing data relating 
to calls incoming to a major telecommunications network, 
the British PSTN, which can produce an output in sufficient 
detail to allow the associated network administration to 
generate account information which not only can be allo- 
cated to outside network administrations appropriately, but 
also supports itemized billing. That is, the account informa- 
tion can be broken down in sufficient detail even to identify 
individual calls, so far as they fulfill preselected criteria, in 
the manner of itemized billing currently available in the 
national billing system for the British PSTN from British 
Telecommunications pic. 

According to a first aspect of the present invention, there 
is provided a process for collecting and processing data 
concerning communication instances in a first communica- 
tions network, wherein the network includes at least one 
point of connection, either directly or indirectly, to as second 
communications network, by means of which point of 
connection a communication instance arising in said second 
network can be transmitted into, and either cross or termi- 
nate in, said first network, the process comprising the steps 
of: 

i) collecting data at a data access point at said point of 
connection, said data concerning a communication 
instance arising in said second network and comprising 
route information and at least one parameter 
measurement, or example duration, with respect to said 
communication instance; 

ii) transmitting said data into a data processing system; 
and 

iii) processing said data. 

By collecting the data at a point of connection between the 
first network and another network, it becomes available to 
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the administration associated with the first network to obtain FIG. 1 shows diagrammatically the architecture of a 

first hand information about communication instances system for collecting and processing data comprising call 

incoming to the first network, and thus potentially to cross information so as to support an accounts system for call 

check data provided by other network operators or admin- instances incoming to a telecommunications network; 

istrators, 5 „ _ , A . a 

According to a second aspect of the present invention, FIGS \ 3 and 4 J™* °*ennew flow diagrams for a 

there is provided a data processing arrangement, for pro- s y stem as shown in nG - V > 

cessing data collected in a PSTN at a point of connection FIG. 5 shows a hardware and communications diagram 

with another network, the arrangement comprising: for the system of FIG. 1; 

i) a data input for inputting data concerning communica- 1Q F[G 6 shows a softwarc architectlire for use betwecn a 
tion instances from a communication network said streamer and a data anal in a tem accordi to nG . 
data comprising at least one of a plurality of sort ^ 

characteristics; 

ii) verifying means for checking the integrity and sum- FIG - 7 shows an architecture for hardware providing a 
ciency of data received at the data input; company system for use in the system of FIG. 1; 

iii) a data analyzer for analyzing data rejected by the FIG. 8 shows a schematic diagram of batch array pro- 
verifying means, and for submitting amended or default cessing architecture for use in the company system of FIG. 
data to the verifying means; 7; 

iv) pricing means for pricing verified data which has been ' F[GS 9 and 10 ghow aQ excfa fik and Advanced 

output by the verifying means, in accordance with Protocol Data ^ (ApDU) fo ^ m ^ with 

updatable reference information; and zu 1V , , t f v . 7 - t 

v , . c 4 . , -c j j 4 f polling of data from exchanges for use m the system of FIG. 

v) output means for outputtmg priced, verified data from i- 

the pricing means into memory locations, each memory ' 

location being dedicated to data relevant to one or more FIGS. 11 to 21 show flow diagrams for use in a streamer 

of said sort characteristics. Mid data analyzer of a system according to FIG. 1; 

Preferably, the pricing means can also validate data, and 25 pro. 22 represents process interactions between elements 

output errored data to a data analyzer, which may be the 0 f me S y S t em m p[ G i 

above data analyzer or a different one, so that data which has 

been corrupted can potentially be reformatted, or otherwise FIGS 23 to 30 provide entity life history diagrams, 

corrected, and, therefore, re-entered to the system as a valid showing the status that a record within each entity might be 

record of a communication instance. 30 m » an d from that status which other statuses can be reached 

It may also (or alternatively) be that this further data by which actions; 

analysis step is used to analyze the data with respect to a FIGS . 31 ^ 32 present the state of an agenda, and a 

different type of fault. For instance data analyse carried out ttern Det fo i lowing data p 0pu i a tion and firing of a rule, in 

on errored data which has been located by the verifying an t tem for ^ in the tem of mQ v 

means might be errored principally in respect of format and 35 

routing information while the errored data from the pricing FIGS. 33 and 34 show object hierarchies, for a rule base 

means might be errored principally in respect of pricing system and case base system respectively, for use in a data 

information. analyzer of a system according to FIG. 1; 

Hie sort characteristics will typically be such that the FIG. 35 shows design principles involved in building an 

memory locations can hold data relevant to communication system/ORACLE interface for a data analyzer in a 

instances which will be billable to a common accounting * j- * a 

, u c . . ... to system according to FIG. 1; 

entity, for instance, arising in a common respective com- 3 & 

munications network. FIG. 36 shows a data model for a company system for use 

The sort characteristics might be applied at any one of in a system according to FIG. 1; 

several stages of the data processing arrangement described pj G S. 37 to 43 show flow diagrams relevant to operation 

above However m a PSTO for example, the nature of 4S q£ a ^ ^ for ^ ^ ^ mQ v 

errored data usually arising makes it preferable to provide J 

sorting means between (iii), the data analyzer associated FIG. 44 shows data a flow, with emphasis on data relevant 

with the verifying means, and (iv), the pricing means. The to a company system for use in a system according to FIG. 

pricing means therefore acts on data already sorted. If the 1. 

sort characteristics relate to the different entities who will be 50 

billed in respect of the communication instances represented DETAILED DESCRIPTION OF EXEMPLARY 

by the data, then this arrangement can also have the advan- EMBODIMENTS 
tage that the pricing means can potentially be simplified in 

applying constraints relevant to individual entities. In some parts of the following description and Figures, the 

It might be noted that a network such as the BT PSTN 55 terms "INCA" and "IDA" may have been occasionally used, 

comprises both local and trunk exchanges and provides not These stand for Inter-Network Call Accounting, a descrip- 

only inter-exchange call transmission but also local call tion of the whole system, and for INCA Data Analyzer. The 

delivery to the end user. This means that the data collection latter is a reference to the data analyzer 7 comprising an 

and processing necessary to support billing or bill verifica- experl sys tem and interfacing with the Streamer 6. 
tion has to be sufficiently complex to deal with an extremely 

wide range of variables. This is in contrast to the situation 60 ^ description below is set out in the following manner: 

where a network provides only inter-trunk exchange 1. FIG. 1: BLOCK VIEW OF ARCHITECTURE 

transmission, or only the local call delivery. 2. FIG. 2, 3, AND 4: FLOW DIAGRAMS FOR PROCESS 

BRIEF DESCRIPTION OF THE DRAWINGS OVERVIEW 

A system according to an embodiment of the present 65 *} Point of Interconnect and DDC 

invention is now described, by way of example only, with u ) Streamer 

reference to the accompanying drawings, in which: iii) Company System (or Box) 
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3. FIGS. 1 AND 5 TO 8: HARDWARE, COMMUNICA- 
TIONS AND SOFTWARE ARCHITECTURES 

i) POI and DDC 

ii) Streamer and Data Analyzer 

iii) Company System 

iv) Client Boxes 

4. FIGS, 9 AND 10: CALL RECORDS AND DATA FOR- 
MATS 

i) Call Records io 

ii) Mapping Data Structures onto Exchange Data 

5. FIGS. 11 TO 19, AND 22 TO 30: MORE DETAILED 
FLOW DIAGRAMS FOR STREAMER AND DATA ANA- 
LYZER PROCESSES 

i) Streamer: DDC Polling 15 

ii) Streamer: FILE PROCESS 

iii) Streamer: DDC Deletion 

iv) Data Analyzer: Process 

v) Entity Life Histories 20 

6. FIGS. 31 TO 35: EXPERT SYSTEM 

i) Overview 

ii) Rule Base Generic Rules 

iii) Case Base 

iv) Oracle Interface 

7. 20, 21 AND 37 TO 43: USE OF EXPERT SYSTEM BY 
DATA ANALYZER 

8. FIGS. 36 and 44: COMPANY SYSTEM, DATA ANALY- 
SIS AND PRICING AND CHARGING 

9. AUDIT TRAIL 30 

L FIG. 1: BLOCK VIEW OF ARCHITECTURE 

Referring to FIG. 1, the system is provided so as to collect 
data in a first network, for example the BT PSTN, relating 35 
to call instances arising in, or incoming from, a second 
network 2. The data is collected, at a Point of Interconnect 
(POI) 3 provided by an exchange of said first network 1, and 
brought to one of about ten district data collectors (DDCs) 
5 in the PSTN. These hold data which comprises route ^ 
information for each incoming call, thus allowing identifi- 
cation of for instance the intended destination of the call, the 
carrier from which the call is incoming, itemization data so 
that each call is treated as an event, and (preferably) calling 
line identity so that calls which were simply transit calls in 45 
the second network 2 can also be accounted accurately with 
respect to the network in which they arose. 

Each district data collector (DDC) 5 is polled by a 
streamer system 6 which expands and validates the call data 
at both file and call record level, primarily against the 50 
Routing Reference Model. (Although the district data col- 
lectors 5 of the BT PSTN pick up the relevant data, their role 
may equally be provided by other component systems of an 
accounting arrangement, such as that known as a network 
mediation .processor.) Data which is found invalid by the 55 
Streamer 6 is diverted to a data analyzer 7 where a 
knowledge-based system is used to access the invalidity and, 
where possible, reform the data in an attempt to solve the 
problem. This is an important component of the system since 
invalid data will generally be lost as an accountable input. 60 
Validated data is meanwhile streamed, according to the 
operator associated with the second network 2 from which 
the associated call was received, and passed on to the 
company system 8. 

The streamer 6 provides the following functions: 65 

Poll each DDC 5 for files awaiting processing by the data 
system of the present invention. 



Validate the file and its call records against the Routing 

Reference Model. 
Expand the call records and Allocate to the correct 

Telecom Network Operator. 
Resect the invalid data to the IDA 7. 
Copy the raw file received from the IDA 7 to the Raw 

Record Backup Interface directory. 
Delete the file from DDC 5 once the data has been secured 

on the interface directory. 
Provide the user with a user interface to enter the Routing 

Reference Model data 
Provide a complete audit trail through the streamer. 
Provide the user with the ability to monitor the operation 

and data integrity of the streaming operation. 
The data analyzer 7 provides the following functions: 
Poll an interface directory for files containing one or more 

errors. 

Hold the incorrect call records in a suspense area if they 

are valid call records but do not match the Routing 

Reference Model. 
Provide a user interface so that users can re stream the 

data after the Routing Reference Model has been 

updated. 

Apply default call record values to fields that are incorrect 
in accordance with the rules specification. 

Stream any correct data thai has not been streamed 
already, due to the error thresholds being exceeded. 

Stream any corrected data. 

Provide a complete audit trail through the IDA 7 at a call 
record level. 

The company system 8 also nowadays has an important 
role to play because it is the company system which imports 
factors derived not only from call parameters but also from 
the relationship between the operators of the two intercon- 
nected networks 1, 2. The impact a call will have in an 
accounting procedure will be partly determined by such 
factors as the "service level agreement" between the relevant 
operators. It is at the company system 8 that these factors are 
brought into play, by reference to various information 
sources which may include look-up tables and/or the 
National Charging Database (NCDB)9. With particular 
regard to the latter, account is also taken here of for instance 
time- varying charge rates. 

The output from the company system 8 is thus finally 
information for use in an accounting system, representing 
the raw call data collected from the point of connection 3, 
and processed with reference to the relevant parameters, 
such as operator-specific and time -varying parameters, 
which should apply. This output is provided to a client 
system 10 which gives user access by means of a personal 
computer. 

2. FIGS. 2, 3 AND 4: PROCESS OVERVIEW 

Referring to FIGS. 2, 3 and 4, flow diagrams can be used 
to give a process overview of the above, in operation in 
response to a call instance. 
2(i) Point of Interconnect and DDC 

FIG. 2 shows process steps carried out by the POI 
exchange 3 and by the DDC 5 in response to an incoming 
call. All these steps are known, the exchange 3 and DDC 5 
being unmodified for the purpose of the present invention. 

Referring to FIG. 2, a call incoming to or outgoing from 
the relevant network 1, at step 200 generates a call record in 
the POI exchange 3. At step 210, the exchange 3 gives every 
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call instance a "File Generation Number" in the series 
0-9999. At step 220, the exchange 3 groups the call records 
into Advanced Protocol Data Units (APDUs) and groups the 
APDUs into files. 

At step 230, the DDC 5 polls the exchange 3 for all call 
record data in APDU format. At step 235, the DDC 5 adds 
control data in the form of a header APDU and a trailer 
APDU. The DDC 5 also, a step 240, gives each file a file 
sequence number in the range from 0-999999, and at step 
245 gives each APDU an APDU sequence number in the 
range 0-16353, APDUs being in binary format. At step 250, 
the DDC 5 places the files in a directory structure, from 
which the Streamer 6 is able to pick them up by polling. At 
the same time, at step 260, an entry is made for the file, in 
a catalogue file which is called DIRINDEX. This catalogue 
file contains a list of all files available to be polled by the 
Streamer 6. 
2(ii) Streamer 

Referring to FIG. 3, at step 300, the Streamer 6 polls the 
DDC directory structure periodically, entering the call 
records into a random access memory (RAM), each file 
being loaded into 1 Mbyte. This polling process includes the 
step of copying the latest DIRINDEX file. At step 310, 
which can be in part of the DDC polling process at step 300, 
the data is converted from binary to ASCII (American 
Standard Code for Information Interchange) format. 

At step 320, the Streamer 6 carries out validation of the 
call records. If a call record is incomplete or incorrect so that 
it cannot be verified, instead of proceeding to subsequent 
processing steps in the Streamer 6 and ultimately to a billing 
process, it is diverted to an interface directory (step 330) for 
further analysis in the incorrect data analyzer 7. 

Valid data however progresses, at step 340, to an identi- 
fication process in which data process in which data in the 
call record is used to establish what other network the call 
originated in, or entered the BT PSTN from, or in some 
circumstances was destined to terminate in. A code repre- 
senting the relevant network operator for billing is added to 
the call record and the files are then broken down and 
restructured, at step 350, according to that code. Hence the 
call records at this point can be sorted according to the 
network operator, or other relevant entity, who is liable at 
least at first instance for the cost of those calls. 

At steps 360 and 370, the Streamer 6 then outputs the 
newly structured files to the Company System 8 and deletes 
file data from the FTAM filestore on the DDC 5. 

Looking at the data analyzer 7, this has an important role 
to play since data which cannot be validated cannot be 
billed. The data analyzer 7, at step 380, polls the interface 
directory for errored files entered by the Streamer 6 at step 
330. The data analyzer 7 then has three different chances to 
put errored data back into the system. 

At step 382, it looks to repair the data. If it can, the 
^ repaired data is returned to the interface directory, from 
which the Streamer 6 can pick it up. At step 384, the data 
analyzer 7 looks to apply default values to unrepairable data. 
Some data elements cannot be "patched" in this manner, for 
instance because it would affect an audit trail. Lastly, at step 
386, the data analyzer 7 checks whether there is simply a 
mismatch between the data and the Routing Reference 
Model (RRM). The latter is a database giving routing' 
information and is used at the DDC 5 to identify for instance 
the destination of a call. Copies of the RRM are held at 
different places in a communications network and, if one 
copy is updated out of time or incorr ectly, can give rise to 
a mismatch in data . If this appears ToTie the case, the 
analyzer 7 enters those call records into a suspend file (step 



388) which allows them go be put back into the Streamer 6 
process after the RAM has been checked. 

If the data analyzer 7 cannot deal with the data in any of 
the above ways, it outputs it, at step 390, to a "sump". This 
5 means the data is effectively lost and will never be billed. It 
might however be useful in analysis so that changes and 
corrections can be made to the system in the long term. 
-2(iii) Company System 
lW Referring to FIG. 4, data, at the file level, which has been*" 
l Vvalidated and processed by the Streamer 6 is input to the 
Company System 8 where the first step, step 400, is vali- 
dation of the file sequence number. The Company System 8 
processes files in file sequence number order, but the 
Streamer 6 has pro cessed data in parallel f rom different 
excha nges 3. If the Hie sequence numbef is wrong, the 
"Company^ ystem invalidates the file and stops processing it 
(step 410). 

If the file sequence number is acceptable, the Company 
System 8 moves on at step 420 to validate the call record, 
this time not primarily in terms of the RRM, as at the 
Streamer 6, but with more emphasis on data relevant to the 
billable entity and the relationship between the billable 
entity and the operator of the first network 1, for instance BT. 
The billable entity and BT will have entered into a service 
level agreement (SIA) and the call record might indicate a 
call type not available to that billable entity under the current 
SLA. The Company System 8 will pick that up as an 
invalidity and, at step 430, attempt to fix the call record in 
error. If the call record can be fixed, it is sent to be bulked, 
at step 440, and re-entered to the Company System 8. If it 
cannot be fixed, it is stored, in step 450, for analysis. 

Valid call records meanwhile are forwarded to the Com- 
pany System pricing engine, step 460, at which individual 
call records are priced in accordance with the NCDB 9, the 
SLA between the relevant billable entity and BT, and any 
other relevant information. The priced call records can then 
be loaded into a summary database, step 470, for charging 
to the relevant billable entity, and the call records are output 
to optical disk (step 480) for storage. 

Client Boxes 10 receive downloaded information from the 
summary database on a weekly basis. Each Client Box 10 is 
dedicated to a single billable entity and can also be used to 
access the optical disk storage, to obtain its "own" call 
records only. 
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3. FIGS. 1, 5, 6, 7 AND 8: HARDWARE, 
COMMUNICATION AND SOFTWARE 
ARCHITECTURES 
3(i) Point of Interconnect 3 and DDC 5 

The exchanges 3 and DDCs 5 are of known type and are 
not described in detail herein. They operate, briefly, as 
follows. 

Referring to FIGS. 1 and 2, any call coming into or 
leaving the British PSTN operated by British Telecommu- 
nications pic (BT) will nowadays pass through a digital 
telephone exchange as the Point of Interconnect (POI)3. All 
such exchanges relevant to the data system of the present 
invention are currently System X telephone exchanges of 
types Digital Junction Switching Unit (DJSU), Digital Local 
Exchange (DLE) or Digital Main Switching Unit (DMSU). 

Every telephone call going into or leaving the BT network 
1, as shown at step 200 of FIG. or, generates a call record 
within the POI exchange 3 in the format known as British 
Telecom Call Record Type 6. The System X POI exchanges 
3 are polled daily by the DDCs 5, at step 230, for all call 
record data in APDU format. Polling takes place using the 
File Transfer Access and Management (FTAM) protocol 
across high-speed BT data links. DDCs 5 act purely as 
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collectors of call record files from POIs: no processing of 
call records takes place within a DDC. DDCs are not 
dedicated to call record polling, but perform a variety of 
other data collection, processing and forwarding tasks. 

In order for the streamer system 6 to gain access to the 
FTAM filestore on the DDC 5, it is necessary to provide 
identification. This is done by allocating a Network Nodal 
Identity (NNI) to the streamer 6 as a relevant end system. 
The NNI is then used as a username for gaining access to the 
FTAM filestore, along with a password. 
3(ii) Streamer 6 and Data Analyzer 7 

Referring to FIG. 5, the hardware and communications 
diagram for the streamer 6 and the data analyzer 7 may be 
as follows. (It should be understood that the communica- 
tions architecture of FIG. 5 represents only one of any 
number of communications architectures that might be suit- 
able in different environments.) The streamer 6 has a "hot- 
standby" Streamer Box Backup (SBB) 6a which cuts in as 
soon as a fault on the main streamer system 6 occurs, and 
both can be provided on Hewlett-Packard HP857S mini- 
computers running the UNIX operating system. The 
streamer 6 and SBB 6a might be connected to local area 
networks (LANs) 515. 

Raw data polled by the streamer 6 (or hot-standby 6a) 
from DDCs 5 (not shown in FIG. 5) is backed up using an 
optical disc storage system (not shown). The data is polled 
using FTAM (File Transfer, Access and Management) over 
BT Megastream high-speed data lines and a Multi-Protocol 
Routing Network (MPRN) 500. The MPRN 500 is OSI 
(Open Systems Interconnection) compliant. There are direct 
communication links 515 between the streamer 6 and the 
data analyzer 7 and an "Ethernet" bridge 505 gives the 
streamer 6 and the data analyzer 7 access to at least one wide 
area network (WAN) 510, for instance that used by BT for 
the PSTN. The WAN 510 in turn gives access to a company 
system 8 and client boxes 10 situated at the primary BT 
PSTN network management and data center. This means that 
the network management and data center can input and 
output data, for instance for analysis and to input initial and 
updated routing reference data. 

Referring to FIG. 6, the Data Analyzer 7 might be 
provided on a Hewlett-Packard HP9000. Software for the 
Streamer 6 and the Data Analyzer 7 utilizes the following 
technologies: 

IEF for Batch Processes 

ART/1M for Expert System Capabilities 

HP/UX Version 9.0 

Business Objects as a PC Client for reports 

Oracle Version 6 

SQLFORMS 3 

SQL*Report Writer 1.1 

PL/SQL Version 1.0 

PRO*C 

SQL'NET TCP/IP Version 1.2 

All these are known and publicly available. For instance 
"EEF" is the "Information Engineering Facility" Computer 
Aided Software Engineering (CASE) software from James 
Martin Associates, a software engineering tool which gen- 
erates executable code. The data analyzer processes run 
physically on the data analyzer 7 platform and use 
SQL*NET to connect to an Oracle database 60 on the 
Streamer 6 platform. SQL* NET TCP/IP (Transport Control 
Protocol/Internet Protocol) can also be used by Streamer/ 
Data Analyzer Business Objects Oracle users 65 in order to 
access the Oracle database 60 located on the Streamer 6 over 
the MPRN 510, or a suitable TCP/IP bearer network. 



15 



25 



30 



35 



40 



45 



50 



55 



The Streamer 6 and the data analyzer 7 share database 
facilities 60 and the users may require access for instance to 
check or update reference data used in validation by the 
Streamer 6. The database facilities 60, inter alia, maintain 
control over which files from the DDCs 5 have been 
processed and contain a version of the Routing Reference 
Model. 

PRO*C code is generated by the IEF into the IEF code 61 
and External Action Blocks (EABs) 62 as shown in FIG. 6. 

The Streamer/Data Analyzer software library 63 is a set of 
"C" and PRO*C modules, callable from within EABs 62 or 
from the ART-IM (Automated Reasoning Tool for Informa- 
tion Management) 64. The ART-IM is proprietary, expert 
system, application development software. The ART-IM 
development is conducted within "studio", a Motif interface 
to the expert system. Once the expert system has been unit 
tested within the "studio", it is deployed by generating "C" 
modules from within the "studio". Hence, for instance, 
processes can be created by generating the IEF Code 61 on 
an OS/2 workstation, and linking the code with EABs 62, the 
Streamer/Data Analyzer software library 63 and the ART-IM 
code library 64 on the deployment platform. 
3 (iii) Company System 8 

Referring to FIGS. 7 and 8, the Company Box (or System) 
8 comprises a Hewlett-Packard minicomputer 70, "Emerald 
890/400", running the UNIX operating system, the 
ORACLE relational database management system (RDMS) 
and a custom application written using the IEF CASE 
software from James Martin Associates. 

Within the Company Box 8, all call records are priced 
according to complex pricing and charging reference tables, 
and ORACLE summary tables are incremented. Reference 
tables provide exchange set-up data, routing reference data, 
accounting agreements, pricing and charging data and vari- 
ous classes of exception. Pricing and charging reference 
tables are derived from BT's National Charging Data Base 
(NCDB) and inter-operator wholesale pricing agreements. 

To help the minicomputer with the very high volume of 
processing tasks involved, Hewlett-Packard UNIX worksta- 
tions 80, for example "735s", are attached as co-processors 
which bid for processing tasks. A virtually unlimited number 
of such workstations may be connected to the minicomputer 
70 to increase the number of call records that the Company 
Box can process but a reasonable minimum for the BT 
PSTN might currently be for instance twelve. As stated 
earlier, it may be that the data system of the present 
invention will be required to process 60 million call records 
per day by 1995. The arrangement relies on the Hewlett 
Packard product known as "Task Broker" 81, the data 
system of the present invention being set up to run on a batch 
array. In order to do so, custom parameters need to be fed 
into Task Broker and an appropriate set of these parameters 
are listed below: 

i) Global Parameter Settings (which are optional) 
which clients may access server f 

which machines may remotely administer Task Broker 
which network mask to be used 
smallest and largest UID (user identity) allowed 
logging verbosity 

maximum number of task submittals to be processed 
concurrently. 

list machines that client should contact for services. 

ii) Client Parameter Settings (which are optional) 

list for each service, the servers the client should 
contact for service, 

iii) Class Parameter Settings 

every service must be in a class; set up a class for each 
type of service each machine will provide. 
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iv) Service Definitions (for every service, the following 
must be specified) 
class 
affinity 

arguments 5 

Note, affinity is a number between 0-1,000 which indi- 
cates how well as node is able to provide a service. 

Task Broker is a queuing system which controls whictTK 
work stations bid for and process files. In orp'er to use Task 
Broker with the company system 8, there are three programs 
and a configuration file. The configuration file sets up the 
parameters Task Broker needs to operate in the company 
system environment including which work stations it can 
communicate with, what programs to call to process a file, 
and how to prioritize. It is the configuration file parameters 15 
which are set out above. 

The three control programs operate (in summary) as 
follows. When a file comes to the Emerald minicomputer of 
the company system 8, a master program "run_cp.slr" sends 
it to be processed via Task Broker and kicks off a monitoring 20 
program "cleanups cp.sh" in the minicomputer. Task Bro- 
ker allocates the file to a work station, which processes the 
file according to a third program "cp.sh". If things go 
smoothly, the file returns to the minicomputer where 
"cleanup cp.sh" allocates it to the correct directories of a 25 
client system 10. "Cleanup_cp.sh" also monitors the work 
stations. If there is an over overlong delay in processing by 
a work station, it will shut down Task Broker on that work 
station since there is clearly then a problem. Lastly, "cleanup 
cp.sh" also controls recording and event logging. 30 

Finally, as well as an output to the client system 10, priced 
call records from the Company Box 8 are saved to an array 
of Optical Disc Drives 71, so That individual priced call 
records may be retrieved and analyzed in future. 

3 (iv) Client System (or Boxes) 10 35 
Summary ORACLE database tables of interconnect calls 

are downloaded weekly from the Company Box 8 to Client 
Boxes 10. Each Client Box (CLB) 10 is a Hewlett-Packard 
UNIX workstation, and deals only with summary database 
tables and call records generated under a single intercon- 40 
nection agreement between BT and another operator, for 
example Manx Telecom. A Client Box 10 runs in ORACLE 
RDMS, and Business Objects software. Information from 
each Client Box 10 allows BT not only to bill another 
network operator in respect of their use BT's network, but 45 
also to verify incoming bills from another network operator 
to BT. Each Client Box 10 can also interrogate the Optical 
discs 41, but only for call records under the interconnection 
agreement associated with that Client Box 10 it is not 
possible for a Client Box to interrogate the Company Box 8 50 
directly for its own call records, let alone those relating to 
other agreements between BT and other operators. Personal 
Computers are connected to a Client Box 10 to allow 
analysis of the Summary Tables. 

4. FIGS. 9 AND 10: CALL RECORDS AND 
DATA FORMATS 

4 (i) Call Records 
British Telecom Type 6 call records are generated for the 

primary purpose of billing customers. Call records should 
contain sufficient information to price a call accurately, 
based on date, time, duration, distance to be travelled and 
other factors. Each Type 6 call record can include the 
following: 

length of billing record; 

record use; 

record type; 
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call type & call effectiveness; 
clearing cause; 

time discontinuity flag (change to/from GMT from/to 

BST during call); 
calling line identity (CLI); 
route group type; 
sampling category; 
route group; 

node point code (NPC): unique identifier for POI 

exchange producing record; 
linking field (used when call straddles more than one 

time-charge band); 
calling party category (business, residential payphone); 
charge band; 

date and time of address complete; 
date and time of answer; 
date and time of calling party clear; 
date and time of called party clear; 
called number field, 

Call records are captured by the Call Accounting Sub- 
system (CAS) Revenue Apportionment and Accounting 
(RAA) facility on all System X exchanges. As mentioned 
above, at step 220 call records are grouped together into 
APDUs, and APDUs are further grouped into a file, with 
each file being up to 1 Mbyte in size. Nothing in this 
grouping process within a System X POI exchange destroys 
any parts of individual call records. All call records are in 
simple binary format. 

Referring to FIG. 9, each exchange file 40 contains a 
number of APDUs 51, which are of variable length. Each 
APDU 51 contains a number of billing records which are 
also of a variable length. The following are, however, fixed 

Exchange File Maximum Size 1 Megabyte 

APDU Maximum Size 512 Bytes 

Billing Record Maximum Size 170 Bytes 

The DDC Header and Trailer APDUs are identical apart 
from the APDU type which is 241 for header APDU, and 
245 for trailer APDU. 

The following information is available in the header and 
trailer APDU: 

APDU Length . . . Length of header/trailer APDU 

APDU type . . . 241 for header, 245 for trailer 

Unique File Identifier . . . See below concerning DIRIN- 
DEX 

Destination NNI . . . NNI of INCA Streamer 
Application Group . . . Application Group of INCA 

data=14 Input tape/cartridge 
Seq No . . . Sequence Number of tape/cartridge 
Output File Seq. No. . . . DDC Sequence Number Times- 
tamp DDC received 
data . . . Date and Time data received by DDC 
Partfile Indicator . . . Indicates whether file is a part-file 
Exception Indicators . . . Indicates what may be wrong 
with file 

Read Count ... No. of times this file has been read 
Filesize . . . Size in bytes of this file Count of unselected 
APDUs ... No. of APDUs of wrong APDU type 
Selected APDU type . . . APDU type of INCA data type 
APDU Count . . . Number of APDUs in this file 
First Seq. No . . . Starting APDU Sequence Number 
Last Seq. No . . . Ending APDU Sequence Number 
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The read count represents the number of times this file has 
been polled from the DDC by the Streamer 6. The partfile 
indicator indicates whether the whole file was received by 
the DDC 5 successfully or whether parts of the file were 
missing. 5 

The exception indicator are two 1 byte bitmask fields 
which indicate any errors that were detected by the DDC 5 
relating to this transfer. 

The valid values for all of the fields above will be 
validated within the computer aided software engineering 10 
(CASE) application software described below with refer- 
ence to the "COMPANY SYSTEM (OR BOX)" 8. 

Referring to FIG. 10, a brief description of the APDU 
structure 51 would indicate the APDU header 52, the actual 
billing records 53 concerned, and the APDU trailer 54. 15 

The format for the billing records 53 is of standard type 
and a "C" structure can be designed to map exactly onto that 
format. 

When data has been polled from the exchanges 3 to the 
DDC 5, some of the data which is at the head of each data 20 
APDU is stripped out by the DDC 5. This data is represen- 
tative of the DDC 5 and of the exchange 3 and is not relevant 
as a data feed for an end-processing system. 

When the file is copied into an appropriate directory by a 
DDC 5, such that it is made available for the streamer 6 to 25 
copy using FTAM, an entry is made for it in a catalogue file, 
caUed DIRINDEX. The DIRINDEX file entry carries the 
following data: 

i) activity marker (1 byte) which may show 

a) inactive entry 

b) file available for transfer 

c) file currently being used (eg in FTAM transfer) 

d) file successfully transferred (not yet deleted) 

ii) INCA filename format 
in) output routine, which may show 

a) file available for FTAM 

b) magnetic tape only 

iv) unique file identifier, including details such as the 
creation time and the relevant exchange NN1. 

v) file size in bytes 

vi) number of APDUs in file. 
Looking at ii), the INCA filename format, that includes: 

vii) the streamer NN1 

viii) NNI and cluster number of exchange 

ix) application group of INCA data 

x) DDC file sequence number of exchange file. 
4 (ii) Mapping Data Structures onto Exchange Data 

The streamer 6 maps the data into data structures for use 
in the model for the Company Box 8, using the following 
principles: 

it is assumed that the APDU length fields and the billing 
record length fields will be correct. If they are not then the 
validation will fail at either the APDU level or the Billing 
Record level, and the file will be streamed to the Data 
Analyzer 7. 

The input data will initially be scanned to find the Header 
APDU 52. This will be identified by an APDU type of 241 
(Hex Fl). The selected APDU type field will then be 
checked alone with the Unique File Identifier to establish 60 
that this is indeed the header APDU 52. 

After the header 52 has been found and the header APDU 
data structure has been mapped, it is assumed that all of the 
APDUs in the file will follow the data standard of a one word 
record length followed by an APDU. eg./HEADER _APDU/ 65 
RL/APDU/RL/APDU . . . /RL/APDU/RL/TRAILER_ 
where RL is the Record Length. 
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If the structure of the file deviates from this standard then 
the file will be streamed to the Data Analyzer 7 for further 
analysis. This error condition will be detected within the 
validation of the APDU immediately following the devia- 
tion. 

Within each APDU it is assumed that the structure follows 
that of FIG. 6. Again any deviation from this structure will 
cause the whole data structure mapping to become mis- 
aligned, and will lead to the file being rejected and streamed 
to the Data Analyzer 7. 

It is assumed that there will be no data after the trailer 
APDU 54. Any data that does appear after the trailer APDU 
54 will be lost. 

5. FIGS. 11 TO 19 AND 22 TO 30: STREAMER 
AND DATA ANALYZER PROCESSES 
5 (i) Streamer: DDC Polling Process 

When files are received by the DDCs 5 they are validated 
(using checksumming) and some extra information is added 
to the beginning and end of the file, in the APDU Header and 
Trailer 52, 54, as mentioned above with reference to FIG. 2. 
These files are then made available for polling by the 
Streamer 6 by placing them in the directory structure to be 
used by the streamer 6, and updating the DIRINDEX file. 
This DIRINDEX file contains a list of all files available to 
be polled by the Streamer 6, and the Streamer 6 uses that list 
to ensure it has polled all new files. 

Referring to FIG. 11, the Streamer 6 will prepare to poll 
multiple DDCs 5 to going into a "Stream all DDCs" process. 
At step 700, the streamer 6 "stream all DDCs" process is 
triggered, for instance at a specified time. At step 710, it runs 
a check that the Streamer 6 is available to receive files from 
the DDCs 5. If the Streamer 6 is available, it goes into a 
cycle, steps 720, 730 and 740, in which it runs through a list 
of the DDCs 5 and creates a "DDC Process" for each DDC 
5 to be polled. At the end of the list, this process finishes 
(step 750). 

For each DDC 5, the Streamer 6 will now run the "DDC 
process". Referring to FIG. 12, at steps 800, 805, the DDC 
process starts with a check as to whether either the DDC 5 
concerned for that process, or the Streamer 6, is shut down, 
and a check at step 810 as to whether DDC polling is due. 
There are certain times at which the DDCs 5 cannot be 
polled and step 815 runs a check as to whether a non-poll 
window applies. If not, step 820 looks for a free process slot 
to process files. If all these checks are clear, the streamer 6 
accesses the DDC DIRINDEX, step 825, and initiates the 
process list, step 830, and file list, step 835, which will 
ensure the streamer 6 applies all relevant processes to each 
of the exchange files received from the DDC 5. In step 840, 
845 and 850, the streamer 6 runs through the files from the 
DDC DIRINDEX, creating its own log of the exchange files 
to be processed, and provides a file processing capacity, 
steps 855 and 860, to process the files in the file list. Once 
all the exchange files from the DDC DIRINDEX fist have 
had processing capacity allocated, the "DDC process" 
updates its record of when the next poll is due, step 865, and 
goes back to sleep, step 870. 

The DOC process will be stopped of course, step 875, if 
either the DDC 5 or the streamer 6 has shut down, and will 
remain in sleep mode, step 870, whenever a poll is not due, 
the DDC is in a non-poll window, or there is no available 
processing capacity. 

typical event cycle within ddc polling architecture 
Assume the following 

DDC_POLLING_MPH=17 . . . This is the minutes past 
the hour to Poll 
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DDC_P0LUNG„INT_HRS-1 ... This is how long to 

wait for next Poll in hours 
DDC_DELAY_IN_DELETE=12 . . . How long to wait 

after the file has been marked for deletion before actual 

deletion request occurs. 
The System has been booted at 23:30 on the previous day. 



Time Schedule 

00:17 DDC Process wakes up, copies over DIRINDEX file, and 
creates processes to stream data to the Streamer 6, 

00:30 DDC Process finishes creating processes to stream 

files because either the Maximum number of processes 

have been created OR all of the files available have 

been given to file processes to download. 

The wakeuptime is calculated as 00:30 + 

DDC_ POLLING__INT_HRS and set minutes to 

DDC_POLLING_MPH. 

-> Next Polling time + 00:30 + 1:00 - 1:30 

(SET MPH) «* 01:17 

Calculate the number of seconds to sleep = 
TO_ SECONDS (01:17 - CURRENT__TTME) 
Sleep (seconds_to_sleep) 
. . . File Processes complete streaming of data 
01:17 DDC Process Wakes up . . . 
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5 (ii) Streamer: File Process 

Referring to FIG. 13, the operation of the File Process, 
created at step 855 during the DDC Process, is as follows. 
The File Process works from a file list received from the 
DDC Process, step 1302. Running through the file list, step 
1303, for each exchange file listed, File Process reads the 
exchange file log, step 1305, validates the call records, step 
1306, copies the file to a raw record backup, step 1307, for 
use if for instance the streamer 6 goes down subsequently, 
diverts the file to the data analyzer 7 if there was a validation 
failure, step 1310, or streams the file to the Company Box 8, 
step 1312. 

File Process stops, step 1313, if the DDC 5 or the 
Streamer 6 is shut down, step 1304 or if the files are 
seriously corrupted, step 1311, for instance because com- 
munications with the DDC 5 have failed. The exchange file 
log monitors what stage an exchange file has reached in 
relation to the Streamer 6 and will carry a status selected 
from active, processed and deleted for each file, where 
" active " indicates it is being processed by the Streamer 6, 
"processed" indicates it has been processed by the Streamer 
6, and "deleted" means it has been deleted from the DDC 5 
by the Streamer 6. 

Referring to FIG. 14, the step 1306 in which call records 
arc validated can be expanded as follows. At this point, steps 
1401 and 1042, the exchange file is copied from the DDC 5 
and the file header and first APDU header 52 validated, steps 
1403, 1405. If either fails, a file error log is created, step 
1412. If both are acceptable, the call records are each 
validated steps 1407, 1408, and a call record error log 
created, step 1409, if one fails. Validation is repeated for 
each APDU 51. Whether validation has shown all is correct, 
or errors have been logged, the audit trail is updated 1413 
and File Process moves on to step 1307 as described above. 

Referring to FIG. 15, files which have been validated 
during the File Process are now ready to go to the Company 
Box 8. At this stage, the file structures are broken down so 
that the individual call records 53 can be sorted according to 
the billable entity they are relevant to. The call records 53 
are now written to physical files, step 1503, for the different 
billable entities. 

5 (iii) DDC File Deletion Process 

Once a data file has been successfully downloaded from 
the DDC 5 to the streamer 6, and the data has been expanded 
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and streamed to the appropriate Company Box 8, the data 
file must be deleted from the FTAM filestore on the DDC. 
The streamer 6 will delete the file using an FTAM delete 
request a number of hours after the file has been secured on 
either the company box 8 (or local storage for the company 
box 8 if the link to the company box 8 has gone down). The 
exact time between the data being secured and the files being 
deleted can be set on a per DDC basis. 
5 (iv) Data Analyzer Process 

Referring to FIG. 16, the step of validating call records in 
FILE PROCESS, step 1306 in FIG. 13, generates a file error 
log, step 1412, and a call record error log, step 1409. The 
data analyzer 7 runs two processes, the "DAPROCESS" and 
the "SUSPENSE FILE PROCESS", which are initiated 
during the boot-up sequence of the HP9000. 

DAPROCESS monitors continuously whether data which 
has been sent by the Streamer 6 is available to be processed 
by the Data Analyzer 7 This data will always exist initially 
as the original exchange file, irrespective of whether the data 
contained individual call records which could not be 
steamed, or the failure was at file level. 

As long as the Data Analyzer 7 is not flagged as shut 
down, step 1602, DA PROCESS will first pick up the earliest 
file error log to be Processed, step 1603, and check whether 
it was a failure at file APDU level or at call record level, step 
1604. 

Referring to FIGS. 16, 17 and 20, if the failure was at call 
record level, DAPROCESS will pick up the next call record 
error log with regard to the file, step 1702, and send the 
relevant call record to the ART EM rule base for correction, 
step 2000. If the failure was at file level, the whole exchange 
file has been rejected by the Streamer 6. In this case, the 
complete file is loaded to memory, step 1606, and the file 
header and APDUs 51 sent to the ART IM, step 1607, for 
correction. 

There are several outcomes to analysis done by the Data 
Analyzer 7. Fixable data will be sent to the ART IM to be 
corrected, and subsequently can be validated and streamed 
to the Company Box 8. If a routing error is involved, the data 
may be put into suspense in case there is a problem with a 
record of routing information somewhere in the system, for 
instance because it needs updating. It may be possible to 
validate call data after all, once the routing information has 
been corrected. If a whole file is unreadable, it might have 
to be sent, still in binary format, to a Binary File Dump. If 
data, for instance a file, is determined by the ART IM to be 
unfixable, and the error is not concerned with routing so as 
to justify suspension, it may be archived. The data will never 
be billed but may be used in analysis to identify long term 
or significant problems which themselves can be put right 
and so avoid losing billable items in the future. 

Returning to FIG. 16, the main DA PROCESS, having 
used the ART IM to run checks at step 1605 and 1607, will 
next sort out files which have been returned from the ART 
IM as unfixable. If they cannot even be read, step 1608, they 
are forwarded to the binary file dump. These files can 
potentially be read, since they may be in hexadecimal, octal 
or ASCII format, and might be used at a later time for 
analysis. Alternatively, files might be readable by the Data 
Analyzer, but are still rated "unfixable" by the ART IM. 
These are, at step 1609, loaded to a "SUMP' database 
where, again, they will never provide billable data but can be 
queried and analysed. 

If a file has been sent to the ART IM and was fixable, the 
ART IM will return each call record sequentially for 
validation, step 1610 and 1611. DA PROCESS will then 
validate these call records first by checking for a routing 
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failure, step 161, and creating a call record error log, step 
1615, in the event that there is call record failure. These will 
get picked up and return through the ART IM, steps 1603 to 
1605 and 1701 to 1703. If the call record is acceptable, it will 
be streamed to the Company Box 8, via steps 161 to 1618. 

Referring to FIG. 18, where there has been a call record 
routing failure, detected at steps 1612, 1704, or 1907 (see 
below), the call records are categorised and suspended. That 
is, the failure is analyzed to the extent that it can be matched 
to an existing routing error pattern, step 1802, and then the 
call record is added to an existing pattern file which contains 
all call records showing the same routing error pattern, step 
1803. These pattern files are held in suspension, the primary 
BT PSTN network management and data center being 
notified. A separate process, SUSPENSE FILE PROCESS, 
then deals with these files. 

SUSPENSE FILE PROCESS is an important aspect of the 
data analyzer 7 because it takes a category of errored files, 
which can potentially be corrected, out of the "mainstream" 
of data processing. These files may only have been picked up 20 
as errored because routing data somewhere in the system has 
not been updated. They are potentially billable. By means of 
SUSPENSE FILE PROCESS, the primary network manage- 
ment and data center has the opportunity to update routing 
data in the system and still catch files found errored previ- 25 
ously. Further, by appending call records to an existing 
pattern file, a "Route Pattern Suspend File", for a particular 
route pattern, files can be selected for re attempting valida- 
tion by simply running a selected Route Pattern Suspend 
File. 

Referring to FIG. 19, as long as the process is not shut 
down, step 1902, SUSPENSE FILE PROCESS starts by 
locating the earliest routing pattern which has been 
amended, for instance, by the network management and data 
center, step 1903. It will then pick up the next suspended file 
containing that routing pattern, step 1904, and attempt to 
validate the call records, steps 1905 and 1906. There may of 
course be more than one routing error in the call record. If 
that is the case, SUSPENSE RLE PROCESS will revert to 
step 1801, on FIG. 18, and create a routing error entry in a 
routing error pattern file, thus re-suspending the call record. 
However, if there is no other routing failure, SUSPENSE 
FILE PROCESS will attempt to stream the call record to the 
Company Box 8, by reverting to step 1501 on FIG, 15. The 
PROCESS runs through all of the call records in the sus- 
pended file in this way, step 1910, and all the files which 
have been suspended with respect to that particular route 
pattern, step 1911. 

Referring to FIG. 22, this shows the process interactions 
between the streamer system 6, the company box 8 and the 
data analyzer 7. The main process area of the streamer 6 is 
the one called "FILEPROCESS". This does all the valida- 
tion and intrinsic operations on a file. In the data analyzer 
area there is the "IDA FILEPROCESS" which enters data to 
the expert system. Importantly, this process triggers the 
Route Pattern Suspend File and "SUSPENSE FILEPRO- 
CESS" by appending data to a Route Pattern Suspend File. 
It is this which avoids a large backlog of data building up 
because SUSPENSE FILEPROCESS operates outside the 
main IDA FILEPROCESS. Another area of interest is the 
"SUMPDATABASE" receiving output from the "SUMP- 
LOADER". Although data in the SUMPDATABASE cannot 
be put right, it can be queried and analyzed so that, 
potentially, rules at the IDA FILEPROCESS can be changed 
so that subsequent data can be re-streamed. 

In FIG. 22, processes are shown in circles, the Company 
Box 8 as a block, data files, logs and the like are shown 



between open-ended parallel lines and archived data is 
represented by the conventional symbols for databases. 

This process, and stored data, interactions referenced (a) 
to (y) on FIG. 22 can be listed as follows, the arrow heads 
denoting relevant transfer directions: 

a) NNI and list of file names to be processed, transferred 

b) Exchange file log, STATUS-A, created 

c) DIRINDEX file accessed 

d) FTAM exchange file copied 

e) FTAM exchange file deleted 

f) Exchange files, where status-P, read 

g) STATUS set to D if exchange files deleted successfully 
(at (e) above) 

h) Exchange file log read where status- A 

i) Exchange file log data updated. STATUS set to P 
j) File is in error so file error log created 
k) Call record is in error so call record error log created 
1) File copied to Data Analyzer directory if file is in error 
m) File error log read 
n) Call record error log read 
o) Raw (binary) data file looked up 
p) Data appended to route pattern suspend file for this 

route pattern 

q) Entry made in route error pattern 
r) ART/IM created closet matches 
s) ART/TM has identified that this data cannot be fixed. 
Data is placed in the SUMP for further analysis or 
deletion 

t) User has identified the problem cannot be fixed. File is 

placed into the sump for further analysis or deletion 
u) When file structure unintelligible, file thrown into 

binary file dumps 
v) Streamed file created. 

w) SUSPEND FILE PROCESS is initiated by status on 
route error pattern being set to ready. If problems 
persist then Count field updated and status set to 
SUSPENDED 

x) Closest matches are updated if the chosen solution fails 
to fix the problem 
45 y) Streamed file created 
5 (v) Entity Life Histories 

Referring to FIGS. 23 to 30, entity life history diagrams 
can show the statuses that a record within the entity can be 
in and, from that state, which other states can be reached by 
50 which actions. In each of these Figures, the statuses are 
simply identified by the reference numeral 2300 and the 
definitions of the statuses are given below. 
FIGS. 23: File Error Log; 
READY — the file is ready to be streamed by the data 
analyzer 7. 

SUSPENSE — either the whole file or a least one call 
record within the file has been sent to the suspense area. 
BIN — the file could not be read by the data analyzer 7 and 

has been sent to the bin area. 
SUMP — the whole file has been sent to the sump area. 
COMPLETE — the data analyzer 7 has streamed the file 
and any of the files call records in the suspense area 
have been successfully re-streamed or archived. 
FIG. 24: Call record error log; 

READY — call record is ready to be streamed by the data 
analyzer 7. 
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SUSPENSE — call record has been sent to the suspense error_Jog" with the "network operator record" identity and 

area. streamed call record sequence number. 

SUMP—call record has been sent to the sump area. " Cal1 record to IDA ™ hs " 

4 n/.nTWPn 11 j u t_ . . iL . i_ Passes a single call record to the data analyzer ART-IM 

^^RrHi^n\ reC0 n J "** base - ^ cal1 record is identified "y lhe APDV 

(le ARCHIVED), sequence number and call record sequence number passed in 
COMPLETE — the data analyzer 7 has streamed the call by IEF. The data structure loaded in memory is then 

record successfully. searched for the call record and owning APDU and 

VAL„FAILURE — there are differences in the ART-IM exchange file header data. This data is then fed into the rule 

and IEF validation procedures. 10 base and validated. Any errors found will each generate a 

FIG. 25: Route error pattern; cal1 record mle lo S row entr y- The call record error log 

TivrccT rnTcn * j . adtu/ j •* • i record status will also be updated by the rule base. 

UNSELECTED— created by ART-IM and waiting analy- "Commit" 

sis by data analyzer user, or re-streamed after analysis Commits ^ currcnt database changes 

but failed " "Create DDC process" 

PENDING— selected by data analyzer user for analysis. 15 Creates an occurrence of the DDC process which will be 

READING — data analyzer user completed analysis and is responsible for polling a particular DDC. It will create/open 

ready to be re -streamed. a ( n ^ e in/file out) to the child process and will write into 

m ncrn r ,, . . An^im^n tne fifo the value of the DDC NNI. 

CLOSED — successfully re -streamed or ARCHIVED. uo . £1 „ ~ 

FIG 26' Closest matches' Create tile process 

20 Creates the process which will perform the task of stream- 

UNSELECTED (OR NULL) — generated by ART-IM ^ g the file names passed in the array of file names. 

SELECTED — selected by data analyzer user for analysis. "Delete file from DDC" 

FIG. 27: Sump file log Deletes a file using the FTAM protocol from disc on the 

SUMP— a file is ready for the SUMP_PROCESS. DDC. 

PROCESSING— the file is ready to be viewed by the data 25 " Delete data anal y zer file " 

analyzer user Deletes a file from the streamer/data analyzer directory. 

ARCHIVED — the file has been archived. pP?^ f^f^f^J^t^ A - 

t-i^o T-i . i-i Deletes a file from the suspense file directory. 

FIGS. 28: File route error link; « FiIe tQ bin „ r J 

SUSPENDED the file is in the suspense area. 30 p asses a file which cannot be read into the ART-IM rule 

COMPLETE — the file has been successfully re-streamed base to the binary file dump. 

from the suspense area. "File to data analyzer rules" 

FIG. 29: Exchange file log; Passes a whole file to the data analyzer ART-IM rule base 

A(CTIVE) — exchange file is being processed by the and initializes the rule base. The initialisation of the rule 

streamer. 35 base involves clearing odd data, selecting default and rout- 

P(ROCESSED)-^xchangefile has been processed by the m £ reference data and populating the rule base with that 

Streamer. data - T * ie data aoa l vzer binary file is then loaded into a data 

rvm c-rcr^ „ i, «iu u a * * a u *u structure in memory. This data is then fed into the rule base 

D(ELETED) — exchange file has been detected by the , , * A J - . M1 < A „ 

Streamer validated. Any errors found will each generate the 

FIG. 30: District data collector; 40 a PP ro P riate ™ le lo S <™ en ^ CaU record error ^ will be 

(All statuses are changed by Steamer 6 users via by base where appropriate, together with 

SQL* Forms ) route error pattern, closest matches and file route error link 

cmcvLnnr^ • u- records. Once validated, the rule base will return a validation 

P(RbBIS) DDC is prebis. stams tQ I£F aod fetain itg iatema] data for ]ater retr i evaL 

L(IVE)— DDC is live. 45 "File to sump" 

C(EASED) — DDC has been ceased. Passes a file which cannot be fixed by the ART-IM rule 

Referring to FIG. 6, it will be sent that the Streamer base to the sump. 

6/Data Analyzer 7 software architecture includes IEF exter- "FTAM HLCOPY" 

nal action blocks (EABs) 62. The EABs 62 are used where Copies, using the FTAM protocol, the DDC file name 

it is inappropriate or not possible to implement within the 50 from the DDC FTAM address using the DDC user name, 

IEF For instance, the following functions might be carried DDC password and DDC account to the Streamer 6. The 

out by means of the EABs 62: user name, password, account and FTAM address of the 

"Add call record to suspense" Streamer 6 can be defaulted to NULL if required. The 

This module will create a new entry of a call record, routine is not called directly from the IEF and hence does not 

within a linked list containing call records for an exchange 55 return an IEF style status, 

file which are to be set to the suspense file. "Get DDC process parameters" 

"Add call record to archive". Creates or opens a fifo in the streamer/TMP directory 

Creates a new entry of a call record, within a linked list which will be named "DDC_process_fifo<PID>. It will 

containing call records for an exchange file which have been read the values of the DDC NNI from the fifo, the data 

fixed but cannot be re-streamed, to the archive directory. 60 having been inserted into the fifo by the "create_file_ 

"Add network operator record". process" routine. 

Checks whether this is a new Network Operator and if so "Get file process parameters" 

will create a new entry in the linked list of "network__ Creates or opens a fife in the streamer/TMP directory 

operator_$tructure". If it is an already used network opera- which will be named "file_process_jifo <PID>". It will 

tor name it will add a finked list entry into the linked list of 65 read the values of the above variables from the fifo, the data 

call records for that network operator. Where a fix has been having been inserted into the fifo by the "create_file 

applied. So a call record it will update the "call_record_ process" routine. 
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"Get exchange file" 

Copies a file using the FTAM protocol from disc on the 
DDC, straight onto disc on the Streamer 6. The file will then 
be read into memory on the Streamer 6 and then renamed 
into the raw record backup directory from where it will be 
archived. This module calls "map_data__structure_to_ 
file" in order to set up the initial pointers to the first billing 
record, first APDU, and the header and trailer APDUs. 

"Get no. invalid data analyzer APDUs" 

Returns a count of invalid APDUs re-processing of call 
records which have failed the Streamer validation process. 

"Map data analyzer file" 

Reads a file into memory for subsequent processing. 
"Process active" 

Establishes whether a particular PID is active and returns 
a flag accordingly. 

"Read exchange file header" 

Uses the pointers to the header and trailer APDU to return 
in a structure all of the fields from the header and the APDU 
type from the trailer. 

"Read data analyzer exchange file header" 

Uses the pointers to the header and trailer APDU to return 
in a structure all of the fields from the header and the APDU 
type from the trailer for a file which has been sent to the data 
analyzer. 

"Read first DIRINDEX record" 

Copies, using the FTAM protocol, the DIRINDEX file 
from the DDC to temporary storage, and opens the file and 
returns the first record to the caller. 

"Read next APDU" 

Returns the APDU structure pointed to by the current 
APDU pointer and sets the current APDU pointer to the next 
APDU. Also sets the current billing record pointer to the first 
billing record within the returned APDU, and copies and 
byte reverses the data into the current APDU array. 

"READ next DIRINDEX records" 

Reads the next record from the DIRINDEX file on the 
DDC. 

"Read next data analyzer record" 

Returns the next billing record output from the ART-IM 
rule base. Successfully processed records will appear first, 
followed by those which require sending to the suspense file. 

"Read next suspense record" 

Returns the next billing record output from the suspense 
file. 

"Read next record" 

Returns the billing record currently pointed to by the 
current billing record pointer, and sets the point to the next 
billing record if this record is not in the last in the current 
APDU. (This is determined using the APDU length and the 
minimum length of a billing record.) 

"Rename network operator files" 

Renames any network operator files that have been writ- 
ten to the temporary directory in the operational directory 
ready to be processed by the Company Box 8. 

"Sleep" 

Will sleep the specified number of seconds. 
"Stream file" 

Dumps the file in memory to the data analyzer ready for 
data analyzer processing. 

"Stream file network operator" 

Uses the pointer to the first network operator to get to all 
of the validated, expanded records for that operator. It then 
attempts to write the records from the linked list into a 
nfs_temporary directory. If successful, the file is renamed 
into the nfs_di rectory. If the file cannot be reopened on the 
n£s temporary directory, the file is opened on the local 
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temporary directory, and upon successful writing the file is 
renamed into the local directory. 
"Stream file RRB" 

Dumps the file in memory to the raw record backup 
directory. 

"Write console" 

Writes a message to a network management workstation. 
Write to suspend file" 

Writes the records from the linked list of suspended call 
records for an exchange file into the suspend directory. 
"Write to archive file" 

Writes the records from the linked list of archive call 
records for an exchange file into the archive directory. 

6. FIGS. 31 TO 35: EXPERT SYSTEM; ART-IM 
6 (i) Overview 

The expert system uses the facilities of the ART-IM 
knowledge based expert system tool kit supplied by Infer- 
ence Corporation. It is a knowledge/rule base programming 
system which allows for a flexible model of decision 
making, and therefore modelling of the real world, within 
the knowledge hierarchy, as well as providing a more 
heuristic method for problem solving. The tool kit contains 
the ART-IM language as well as an integrated editor, and 
interactive development environment, tools for the devel- 
opment of end-user interfaces, a method of deploying run 
time versions of developed applications and the facility to 
interpret external data intelligently. 

In the data analyzer 7, the expert system is split into two 
subsystems, the rule base and the case base. In general, the 
case base is used to deal with routing based errors, and the 
rule base is used for defaulting and calculable errors. Both 
make use of the ART-IM functionality. 

The rule base uses the ART-IM procedural languages 
including rule, function and methods. Each error is defined 
within a schema and instances of these schemes are used on 
the data structures. All schemes within an in-data object 
hierarchy are populated via the IEF/ART-IM interface using 
the "DEF-EXTERNAL_FUN" facility of ART-IM. 

The mechanism of program flow control used by ART-IM 
is very different from sequential statement-by-statement 
flow, as usually found in programming languages. Referring 
to FIGS. 31 and 32, the expert system holds all its internal 
data, that is schemata and facts, in a pattern net 3100. This 
is represented in FIG. 31 by a plurality of patterned circles, 
each representing a piece of internal data (a schema or a 
fact). This data can be set up by 

loading an ART-IM test case file (more usually done in a 
development/unit testing context). 

by populating from an external source (eg Oracle or IEF; 
more usual in a production/system test environment). 

by generating from ART-IM rules (used as very flexible 
"working storage" eg generation of error schema after 
validation test failure). 

Once set up, data is compared directly with the conditions 
specified within the rules. A rule resembles an 
"IF<conditions>THEN<action>" of a more traditional pro- 
gramming language. If conditions of the rule match exactly 
an instance of data, an activation is created within an 
associated agenda 3105. All instances are checked against 
all rules. In terms of performance, the pattern net and rule 
conditions are managed by an efficient pattern- matching 
algorithm within the ART-IM run time system. 

At the end of the evaluation part of the cycle, all rule 
activations are placed in order on the agenda stack. The first 
rule activation on the stack will be fired. The order of 
appearance of activations defaults to random unless 
salience, that is priority of rules, is set by the developer. 
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Referring to FIG. 32, after firing of the topmost rule 
activation on the agenda 3105, the action of the rule has 
actually changed data in the pattern net which will in turn 
after what appears on the agenda stack following the next 
evaluation cycle. 5 

It might be noted that the data instance causing the initial 
firing (the circled instance 3110) will not be reevaluated, 
thereby avoiding continuous looping although if the data 
within the data instance changes and the new pattern 
matches a rule condition, then a rule activation will be 10 
created. 

The ART-IM run will finish when: 
no matching conditions and patterns found, all matching 
conditions and patterns have already fired rules. 

The above can be summarized as follows: 15 

1) rule activations are generated by matching data patterns 
with rule conditions 

2) rules can, by default, fire in any order although priori- 
ties can be set 

3) all data is evaluated in parallel 

4) re -evaluation occurs each time a rule has fired 

5) the same rule can fire many times during a run, 
depending on the number of matching data instances 

6) rule conditions are sensitive to changes in the pattern 25 
net 

7) ART-IM stops if no matching rule conditions or pattern 
net data is found or all matched activations have fired 
already. 

Referring to FIG. 33, the rule base system is based on an 
object hierarchy as shown. Each of the objects 3300 is 
defined in ART schemes and she connecting lines between 
objects 3300 depict inheritance from the object above. 

The exchange file, APDU and call record contain slots for 
each data item in their structure. Each slot has a correspond- 
ing slot in the appropriate default object to declare whether 
the resultant has a default value, a calculable value or is an 
un-modified field. The rule base uses the default system to 
check what form an error correction must be, if allowed. 

The above covers the data schemas. With regard to error 
schemas, every possible data analyzer error has its details 
described within an appropriate schema. Each error descrip- 
tion and its instances contains a slot for each of the follow- 
ing: 

The object on which the error relates, that is an exchange 
file. 

An error description. 
The affected slot. 

The specific data object for an error instance. 
The name of the repair value. 
The source of the error. 
The resultant repair value. 
The rule position in fire order. 
The value of the slot prior to any fix being applied. 
6 (ii) Rule Base Generic Rules 

The rule base operational flow is controlled by a number 
of generic rules, these performing the following functions: 
for each occurrence of an error trigger, that error's repair 

method is fired to generate a repair value and its fire 

order allocated, 
for a fixable error where there is only one affected slot, the 

affected slot is updated with the repair value generated 

and the time stamp of the change is stored with the 

instance of the error, 
for each instance of an error where the repair description 

declares the error type as suspend able, the data item 
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affected is moved to the suspense file and the time 
stamp of the move is store with the instance of the error, 
for each instance of an error where the repair of descrip- 
tion declares that the error type is sumpable, the data 
item affected is moved to the sump and the time stamp 
of the sumping of the file is stored with the instance of 
the error. 

a record is created on the file structure rule log for each 

fix on an APDU or exchange file, 
an Oracle record is created on the call record error log for 

each fix on a call record with the appropriate error 

information, 
fixable errors 

1) Default values can be allocated to the following fields: 
APDU type and trailer 

billed call indicator 
called party clear 
PBX sufEx 
record use 
record type 
DDC time stamp 
header APDU type 
class of data transfer 
format version number 
node time stamp 
part file indicator 
table size 
trailer APDU type 
called party clear 
application 

2) The following errors are calculable: 
APDU length; the length of the APDU. 
APDU count; the length of the APDU sequence. 

End APDU sequence number; start sequence number plus 

the number of valid APDUs. 
Start APDU sequence number; obtained from the 

sequence member of the first APDU in the exchange 

file. 

Dialed digit count; the length of the dialled digit string. 

There are error exceptions with regard to the above, such 
as where the checksumming for an APDU shows an error. 
Errors of this type are immediately sumped within the rule 
base. Some errors with regard to the APDU sequence result 
in the complete range of sequence numbers being 
re-sequenced from "1", and the related exchange files being 
updated. It may be that the last digit of a dialled digit string 
is a character between A and R The repair value here is the 
dialled digit string minus the last digit. 

non-fixable errors 

On the non-fixable error occurrence, the data item in error, 
ie a call record, is passed to the sump, as described above, 
and the appropriate error log updated. Areas which cannot be 
amended, and which therefore generate non-fixable errors 
are as follows: 

address seizure time stamp 

address completion time stamp 

either address or answer time stamp 

calling party clear time stamp 

calling line directory number 

seizure time stamp 

dialled digit string (except when the last digit is between 
A and F) 
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6 (Hi) The Case Base System i) up to three potential-solution schemata, each containing 

The routing reference case base is a case base of routing a pointer to the associated Routing Reference schema, 

patterns (ie TUN, route group, route number, NN1, Nodal The potential -solution schema will also contain the 

Point Code) plus other reference data, for example the match position (ie closest, next closest etc) and a % 

billable network operator name, and Live and Ceased Node 5 measure of the closeness of the match, and 

time stamps. The case base reference data is populated from ii) a pointer to the incomiag call record in error, 

the Routing Reference schemata which in turn are populated ]t should he noted that the repair method will differ from 

from data contained within the Streamer Reference Data the usual repair method invoked by generation of an error 

subject area 3600 of the data model (See FIG. 36). schema instance because it will consist of a function 

Referring to FIG. 34, it can be seen that the object 10 (routing-mismatch, which will assert the suggested-solution 

hierarchy for the case base system is similar to that for the schema instance and facts containing keys to the Routing 
rule base system, shown in FIG. 33, with the addition of Reference schema) and another rule (generate-closest- 

three object classes 3400; "suggested solutions", "potential matches, which will trigger on generation of the facts 

solutions" and "routing reference". It might be noted that created by routing-mismatch and will generate one instance 

"suggested solutions" and "potential solutions" are only 15 of a potential-solution schema for each case base match 

created following identification of a routing reference error found). 

and contain mainly pointers to other data, that is incoming Where node time stamp validation is concerned, the case 

call record in error and routing reference schema that are base will be used as follows: 

most closely matched. The "routing reference" schemata are to attempt to find an exact match between each incommg 

created from the routing reference data on the Oracle data- 20 call record schema md a Routing Re f eren ce schema. If 

ba ? e ' there is an exact match the rule will then check for time 

With regard to the routing case base, and initialization, the stamp discrepancies (ie seizure time stamp should fall 

routing case base is populated at the start of a run from between node live and cease times) on the matching 

routing-description schemata. One case is created for every incoming call record schema and the Routing Refer- 

routing description schema. The routing case base will be set 25 ence schema . If no discrepancy exists, no further pro- 

up with the following parameters. associated with mis error will take place . 

Maximum of three matches -r , . c * , .« 

A . , , i , , i i ^ , * ■ ii a time stamp discrepancy is round, an error schema will 

Any matches below threshold value of zero probability , . , - u\ * JL \ , • , 

. . , , t be generated which triggers the rule base generic rules, 

will be ignored so as to weed out highly unlikely matches. , , . . „ n . iL ° , 

rt i f. r ii • i i l j • as above, which will apply a repair method. 

Only the following slots on the case base are used in 30 ' . , , .„ 

pattern including: the s P ecific ^P^ 1 " method will create one suggested- 

TriM r u ft u kt u \ * J 1 solution schema will contain (see FIG. 34): 

TUN (ie Telephony Unit Number), route group, nodal » *• * i i_ < i_ . • • 

. \ * , , VTXT1 i j . °. r ' one potential-solution schemata each containing a 

point code, route number, NN1, and direction • . . .l • . j ™ ™ . 

mf c 1t . . Jf r ^ pointer to the associated Routing Reference schema. 

The following are ignored for purposes of pattern match- t t . , , t . , ■„ , t . 4 , 

& & r r r ^ -p ne potential-solution schema will also contain the 

. . match position (ie closest, next closest etc.) and a % 

Live node Lime stamp measure of ^ closeness of the match> 

Ceased node time stamp a pointer to the incoming call record schema in error. 

Telecom network operator role and name It should be noted that the repair method will again differ 

Direction is treated as slightly different for matching 40 from the usual repair method invoked by generation of an 

purposes. It is the least significant matching slot and is error schema instance, because it will consist of a function 

given a fixed weighting ceiling of 5% of the overall (n ode -timest amp -discrepancy — which will assert the 

weighing. The other slot weights will be split equally suggested-solution schema instance and facts containing 

between the remaining 95% of the overall weighing. keys to the Routing Reference schema) and another rule 

Pattern matching, together with other case base functions 45 (generate_node__time_discrepancies which will trigger on 

such as setting initialization parameters, is achieved by generation of the facts created by routing-mismatch and will 

sending messages to the case base. The pattern matching is generate one instance of a potential-solution schema), 

done in two steps, these being to send an incoming call 6 (iv) ART-IM and Oracle Interface 

record schema to the case base, which will return the number Referring to FIG. 35, direct access to the ORACLE 

of matches found, and to send a retrieve-match-score mes- 50 database from ART-IM is required to exploit fully the 

sage which will determine the closeness of match for each "Parallel" validation features of the ART-IM rule base. There 

returned case together with the key of the Routing Reference are f° u r main interfaces: 

schema associated with the returned case. the population of Routing Reference data 3500 

The case base is used for an error code validation relating the population of default data 3505 

to call pattern, in the cases of Nodal Point Code or Route 5S the output of fix data to form an audit trail 3510 

Group Not Found, Invalid Route Number, or Direction the output of routing error patterns as a recursor to 

Invalid, as follows: suspense data handling 3515. 

attempt to find an exact match between each incoming Looking at the population of Routing Reference data, this 

call record and a case on the Routing Reference case interface 3500 involves refresh of internal ART-IM schema 

base. If there is an exact match the call record has a 6 o and from data in the Routing Reference Model 

valid routing pattern and no further validation with physically held within ORACLE tables: 

regard to the above error will be required. tne refresh is triggered during the initialization phase of 

if no exact match if found, an error schema will be an ART-IM run. 

generated which triggers the rule base generic rules, as existing internal ART-IM Routing Reference schema are 

above, which will apply a repair method. 65 cleared together with their casebase entries, 

the specific repair method will create one suggested- data is SELECTED from ORACLE tables from a ProC 

solution schema which will contain (looking at FIG. 34): program (EAB _JNITI ALI ZE_ID A_RULEB ASE) 
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which will be used as part of two External Action the same call record — these unfixable call records are 

Blocks (file_to_ida_rules and call_record_to__ida_ weeded-out using the move-to-sump generic rule), a 

rules). generic rule (move_to_suspense__file__area) is fired. 

Internal ART-IM schema are populated by the ProC P 8 ™ 1 " trie f !? * elect °» ? attern in erIor from the 

5 database, and, if the pattern m error exists: 

program i) tests for any entry in FILE_ROUTE_ERROR„ 

The Routing Reference Casebase is in turn populated LINK with relevant pattern exchange file and foreign 

from the internal Routing Reference schema by a ^ C y S 

function (inca_ida_initiakze_casebase) as part of the »j if ^ entry exists nQ farther action u required . 

casebase initialization process. if the entry does QOt exist then inserls a row entry 

Looking at the population of default data: int0 ^ FILE_ROUTE_ERROR_LINK. 

the refresh is triggered during the initialisation phase of an If the pattern in error does not exist: 

ART-IM run. iv) inserts a row entry into a ROUTE_ERROR_ 

existing internal ART-IM default (df-call-record, df-apdu PATTERN table populated by error pattern data from 

etc) schemata are cleared together with their casebase 15 incoming call records. 

entries. v) inserts a row entry into FILE ROUTE ERROR 

data is SELECTed from ORACLE tables from a ProC LINK, 

program (EAB___INITIALIZE_[DA„RULEBASE) vi ) inserts U P to 3 row entries into the CLOSEST, 

which will be used as part of two External Action MATCHES table populated by routing reference 

Blocks (file_to_ida_rules and call__record_to_ida_ 20 patterns found from previous casebase processing to 

rules) be closest to the route pattern in error. 

Internal ART-IM schema are populated by the ProC A user defined procedure is used to pass SQL command 

program to ORACLE 

Looking at the creation of Error and Fix data, if errors are The ART-IM rules will populate the inserted values from 

detected during incoming data validation which are associ- 25 slots on internal schemas. 

ated with data that can be fixed then an audit trial of the fixes FIGS 20 21 37 TO 43* USE OF EXPERT 

applied by the rule base needs to be maintained: SYSTEM BY DATA ANALYZER 7 

f0r nrr e c ry ™™» U T^^uf r0 ^? Dtr ? eat6 K * l u Q 1° flow diagrams referenced below, it might be noted 

FILE ERROR LOG table. This is done by the ... v , 4l , f . , , , c 4 . 4 c 

— — J 3 q that a slightly dinerent tormat has been applied from that 01 

s reamer process. earlier flow diagrams in this specification. That is, function 

for every call record in error a row entry is created in the calls are denote d by boxes with double verticle lines, simple 

CALL_RECORD_ERROR_LOG. This can be done statements are denoted by boxes with single vertical lines, 

by the streamer process or by ART-IM. md yes/no decisions are denoted by a simple diamond. 

For every error detected and fix applied at the file struc- 35 ^ 0 f the ART-IM expert system by the data analyzer 

ture level a row entry is created in the RLE_ 7 can be expressed In flow diagrams. Referring to FIGS. 16, 

STRU CTURE_RULE_LO G on the ORACLE data- 17 and 2 0, once it has been determined that there is a failure 

base. This is best done by the rule base using a generic at call record level ^ step 1605 and the next call record error 

rule which is triggered when all file level error detec- log has been selected from a file, step 1702, the relevant call 

tion and fixing has completed. The rule should fire once ^ rec0 rds are sent to the expert system, step 2000. The expert 

for each error detected/fix applied and when fired will system locates the correc t APDU, steps 2005, 2010, and then 

invoke a user-defined-procedure call sql_exec_ foe errored call record, steps 2015, 2020. 

limited which does the necessary insertion. ^ expeft system then cfaecks whetfaer tfae call recofd js 

for every error detected and fix applied at the file structure correctly itemized (step 2025), in this example according to 

level a row entry is created in the CALL_RECORD_ 45 System X itemization, and, if it is not, directs the call record 

RULE_LOG on the ORACLE database. This is best to sump by setting the IEF status to "SUMP", step 2030, 

done by the rule base using a generic rule which is while updating the call record error log, step 2035. If the call 

triggered when all call record level error detection and recor d i s correctly itemized, it is "put through" the expert 

fixing has completed. Again, the rule should fire once system, steps 2040, 2045, 2050, and the results assessed by 

for each error detected/fix applied and when fired will 50 the data analyzer 7, in step 1704 onwards, 

invoke a user-defined-procedure call sql-exec-immed Referring to FIG. 16 and 21, it may have been decided 

which does the necessary insertion. that there ^ f ailure at file or levelj step 1604 In that 

the ART-IM rules will populate the inserted values from cas e, the file is loaded to memory and the file header and 

slots on internal schemas. APDUs sent to the expert system; step 2100. The expert 

Looking at the creation of Routing Error Patterns and 55 system database is called up, step 2105, and the APDU 

Closest Matches data, if errors are detected during incoming schemas from the previous run deleted, step 2110. The first 

data validation which are associated with data that is sus- test run is to refresh the expert system version of the Routing 

pended then a record of the incoming call record error Reference Model, step 2115, which may immediately result 

pattern (based on TUN, NNI, route group number, route i n correcting the apparent error. If not, the default data for 

group, direction) together with the three closest matches 60 the expert system is refreshed, step 2120, in case for instance 

(based on the closest patterns on the routing reference model default data for the error concerned has previously been 

to the incoming call record in error) needs to be stored on the missing. If either of these is successful, the data analyzer 

ORACLE database for later suspense filed processing. Pat- process reasserts itself, FIG. 16, and the results from the 

terns are stored following completion of all validation/fix expert system refresh steps will allow the file to go to 

processing. In more detail: 65 validation of its call records, step 1611. If neither is 

for every error generated that is a suspense file error (and successful, the call records themselves must be in individu- 

assuming no unfixable errors have been generated on ally validated. This is described below. 
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Referring to FIG. 37, the function box 2125 of FIG. 21 
"map header and APDU schema", expands to include load- 
ing (steps 3700 to 3725, 3735) and running (steps 3730, 
3740, 3745, 3750) the expert system, ART-IM, with respect 
to call records from the errored files which could not be 
processed successfully after refreshes of the Routing Ref- 
erence Model and default data. This loading process 
includes getting data not available on the ART database 
("Foreign Keys"), for instance data from the Streamer 6, in 
step 3715, to enable the expert system so access the files. 
Having analyzed each call record, the ART supplies as status 
(step 3755), which may indicate the call record is fixed or 
should be suspended or sumped. The data analyzer process 
(IEF) keeps a count of the call records to be sumped, step 
3760, and sets a flag in the ART-IM, step 3765, which 
triggers clean-up by the ART-IM, step 3770, to clear out each 
call record and relevant schemas to avoid these simply 
building up. 

Referring to FIGS. 38 to 43, the application of the expert 
system file rules can also be expressed in flow diagrams, and 
the following examples are shown, the flow diagrams being 
self-explanatory: 

i) FIG. 38; ART File Rules (exchange file header) 
This can be applied to 

trailer APDU 
format version number 
filter type 
node timestamp 

DDC/NMP timestamp (NMP stands for Network Media- 
tion Processor) 
class of data transfer 
node cluster identity 
streamer NNI 
application group 
part file indicator 
file byte size 
table size 

selected APDU type 

ii) FIG. 39; APDU first sequence number rule 

iii) FIG. 40; APDU last sequence number rule 

iv) FIG. 41; APDU sequence number count rule 

v) FIG. 42; ART APDU rules 
This can be applied to 
retransmission indicator 
linking field 

vi) FIG. 43; ART call record rules 
This can be applied to 

record use 

billed all indicator 

clearing cause 

PBX suffix 

CLI cluster identity 

network circuit 

network band 

circuit identity 

circuit number charge band 

call sampling method 

sampling mode 

count reset indicator 

value of N (where N relates to a count made while running 

a test set of call records for example) 
called party clear timestamp 
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8. FIGS. 36 AND 44; COMPANY SYSTEM 

Referring to FIG. 4, the output from the Streamer 6 to the 
company System 8 comprises call records sorted according 
to billable entity, and validated as described above using a 
data analyzer incorporation the ART-IM expert system. 

The primary role of the Company System 8 is to price the 
call records and to output the priced records so that they can 
be billed to clients. However, it also has a validation role, as 
mentioned above, with emphasis on data relevant to the 
billable entity and the relationship between the billable 
entity and the operator of the first network 1. The company 
system 8 therefore incorporates or accesses a company 
system data analyzer, referred to in the following as "cIDA". 

The cIDA application can reside alongside the data ana- 
lyzer 7 which validates data from the Streamer 6, described 
above. In FIG. 4, the steps of fixing errored call records, 430, 
bulking the fixed call records, 440, and investigating unfix- 
able call records, 450, can all be carried out by means of the 
cIDA application. 

Interestingly, it has been noted that the majority of errors 
of the order of 90% of the errors picked up by the company 
system 8, concern decode anomalies, mainly to do with 
"time lines" such as "123" and "emergency services" (999) 
calls. The bulk of the remainder of errors can be attributed 
to discrepancies in reference data. There can therefore be 
two primary aspects to building a data analyzer for use with 
the company system 8, these being to tacked records pro- 
viding the majority of the errors, the decode anomalies, and 
then to provide an infrastructure capable of representing files 
back to the company system 8 after correction. 
Processing Overview 

A suitable arrangement might be as follows. Error and 
warning files are sent from the company box 8 to the cIDA 
where they are loaded to specific directories, one per opera- 
tor. A single file can hold zero or many records. Preferably, 
the cIDA provides a parallel processing facility for all 
operators, running concurrently, with the capability of 
manual override. A log is maintained in order to control the 
sequence of files into and out of the cIDA. 

Once an error file has been selected for processing, the 
cIDA selects each record in turn, assuming the file is not 
empty, and evaluates the error into one of two categories: 
fixable and unfixable. Unfixable records are written to a 
table, reported on, and can later be removed from the 
database for archiving. Where a record has been deemed to 
be fixable, it might be fixed automatically by applying rules, 
or it might need manual intervention before it can be fixed. 

Each record, irrespective of error type, is inserted into an 
ORACLE database table, with all details passed from the 
company box 8 and a flag set to indicate the "state". The 
state might, in accordance with the above, be selected from 

suspense 

unfixable 

rules 

Users, using Business Objects run at regular intervals, 
have the capability to view all records currently held and the 
state allocation they have been given. An audit log can be 
held for a relevant period, such as for one month for all 
"charging number string"' corrections. 

It might be noted that the use of automatic rules may well 
be found unnecessary. By correcting errors caused by 
decode anomalies, that is 90% of current errors, the error 
rate has been found to be reduced to 0.01%. Hence, the 
simplicity of errors arising means that a system employing 
automatic rules would be over complicated. 

Referring to FIG. 44, the dataflow routes about the data 
collection and processing system of the present invention 
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can be seen. In this Figure, data stores such as files and tables 
are represented by the horizontally extending rectangles 
with vertical dotted lines, and processes are represented by 
the bigger blocks, incorporating rectangles. Entities external 
to the whole system, such as the NCDB 9, are represented 
by the "lozenges". 

As already described, raw call data is entered to the 
Streamer, which converts the raw call data, validates and 
processes the call records, involving a data analyzer so far 
as necessary, and outputs validated, itemized call records to 
the company box. The company box firstly performs opera- 
tor specific validation, and secondly aggregates itemized call 
records. At this stage, the call records are priced, using 
charging information for instance from the national charging 
database (NCDB) 9, and output in summarized form to 
produce a bill report for the relevant client system 10. Other 
outputs include the expanded call records, stored in optical 
disc 71, and summarized call records for a management 
reporting system 4400. 

It can be seen in FIG. 44 that there is also an output from 
the data analyzer to an auditing system "CARDVIT 4405. 

Although embodiments of the present invention can pro- 
vide extremely detailed information for audit purposes, the 
auditing system itself is not part of the invention and is not 
therefore described herein, beyond the comments below at 
"9. AUDIT TRAIL". 

Referring to FIG. 36, a data model for the company 
system 8 shows clearly the data sources for use at charging 
and pricing by the company system 8. Much the greatest 
amount of data, the "C&P reference data", is derived from 
the NCDB 9. However, there are constraints set by the 
accounting agreement 4500 between the billable entity and 
the operator of network 1. Many issues can be dealt with 
from the network management center and the data model of 
FIG. 36 provides appropriate visibility thereto by means of 
the "telecoms network operator role", box 4505. 

The following initials, used in FIG. 36, can be expanded 
as follows: 



CBM 


Charge Band Matrix 


CB 


Charge Band 


NN 


Kingston Communications, Hull (an operator in 




the UK of a network interconnected to the BT 




PSTN 


TE 


Telecom Eirann (as above) 


NCIP 


National Charging Information Package (an 




interface to data on the NCDB) 



Pricing and charging engines, complying with the type of 
constraints offered by the system of the present invention, 
are known and specific description of the charging and 
pricing engine is not therefore offered here. Indeed, although 
the data model of FIG. 36 shows all entities involved, not all 
the relationships are shown as the representation would 
become to complicated. Overall, however, it must be borne 
in mind that the call records handled by the company system 
8 are already sorted according to billable entity. The aspect 
of the data needs to be maintained, clearly, so that relevant 
reports can be allocated to the correct client systems 10. This 
can be done, as indicated above, for instance by maintaining 
allocated directories for the billable entities. 
9. Audit Trail 

An arrangement as described above can provide a sophis- 
ticated audit trail. Data from the exchange at the point of 
interconnect comes in a file, and is packaged into APDUs. 
The streamer system 6 polls data off the DDCs 5 using the 
FTAM protocol, The data being in binary, in call records. 
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The streamer system 6 validates the data against the data 
base containing reference data, the Routing Reference 
Model, and assesses which other network operator should be 
billed. The streamer system 6 writes a full call record in 
ASCII with operator and exchange information added. 

An audit trail arises as follows. On the exchange, call 
instances are numbered with a File Generation Number 
which cycles from 0-9999. The DDC 5 also adds a sequence 
number which cycles from 0-999999, at the file level. 
Within the file, APDUs are also sequenced with an APDU 
sequence number which cycles from 0-16353, being binary. 

This means that there is stored a record of the number of 
records in a file, the APDU start and finish numbers, and the 
number of APDUs. 

Because a sequence number is added to each number at 
the exchange, it can be ensured that the company box 8 
receives the numbers in sequence, although they will not 
necessarily be processed in order. The streamer system 6 
actually processes in parallel from different exchanges at the 
same time. 

In the data analyzer, where a "pattern net" is used, by 
means of which data will "fire" a rule if it does not fit valid 
content, the analyzer can patch data items only where the 
data item concerned would not affect price or the audit trail. 
Patch in this context means set to a standard value. Hence, 
the data analyzer cannot change the call record sequence 
number because that identifies the call record. If the call 
record sequence number were to be changed, there would be 
no audit trail. 

The system described above is, as stated, only one specific 
embodiment of the invention. It relates to a PSTN and, as 
described, deals with call records in a voice communications 
system. Further, the specific form of call records involved, 
System X Type 6, relate to only one type of exchange which 
might be used at a point of interconnection (POI) between 
networks. 

Many changes might be made, however, without depart- 
ing from the spirit of the present invention. A simple 
extension of the application of the invention is that, as well 
as using call record data to generate billing information, 
traffic analysis information can also be picked up and 
processed. For instance, calls which are ineffective in reach- 
ing a destination, "^effectives", can be counted by the 
exchange at the POI and the "bulked" outcome input to the 
data processing system. 

However, more significant changes might include the use 
of the system with communications other than voice 
communications, even excluding voice communications, 
and, as already mentioned, it is clearly not essential that a 
PSTN is involved, although the benefit of embodiments of 
the invention is clearly significant with a PSTN in the light 
of the sheer volume of records and complexity of sources 
involved. 

I claim: 

1. A process for collecting and processing data in a first 
communication network, the data concerning communica- 
tion instances, wherein the network includes at least one 
respective point of connection to at least one other commu- 
nications network, the process comprising the steps of: 

i) collecting data at a data access point at each said point 
of connection, said data concerning a communication 
instance arising in an originating network other than 
said first network, and comprising route information 
identifying the originating network and at least one 
parameter measurement with respect to said commu- 
nication instance; 

ii) transmitting said data into a data processing system; 
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iii) processing said data to generate billing information; 

iv) allocating said billing information to one of said 
communications networks; and 

v) accumulating respective billing information for each of 
said communications networks[.]; 

wherein said data processing system includes a data 
analyzer, and said processing step includes validating 
the data followed by analyzing invalid data, correcting 
said invalid data and processing said corrected invalid 
data as valid data. 

2. A process as in claim 1, wherein said first network 
comprises a public switched telephone network. 

3. A process as in claim 1 wherein said processing step 
comprises streaming said data according to the identity of 
said originating network. 

4. A process as in claim 1 wherein the first network 
comprises a communication network including both local 
exchanges and trunk exchanges and the data processing 
system includes correlating pricing and charging data from 
a database in accordance with the route information identi- 
fying the originating network. 

5. A process in claim 4 wherein said correlation is carried 
out subsequent to streaming the data. 

6. A process as in claim 1 wherein [said data processing 
system comprises a data analyzer, and said processing step 
includes validating the data followed by analyzing invalid 25 
data,] the analysis [including] includes a step of identifying 
data which can potentially be set to a default value, and 
correcting said invalid data includes setting the data to a 
default value [and processing it as valid data.] 

7. A data processing arrangement for processing data 
collected in a communications network but concerning call 
instances arising outside the network, the arrangement com- 
prising: 

i) a data input for inputting said data, said data comprising 
at least one of a plurality of sort characteristics; 

ii) verifying means for check the data received at the data 
input; 

iii) a data analyzer for analyzing data rejected by the 
verifying means, and for substituting amended or 
default data therefor; 

iv) pricing means for pricing data output by the verifying 
means or by the data analyzer in accordance with 
updatable reference information; 

v) output means for outputting priced data from the 
pricing means into memory locations, each memory 
location being dedicated to data relevant to one or more 
of said sort characteristics, and 

(vi) accumulation means for accumulating price data in 
respect of each communication network causing said 
call instances. 

8. A data processing arrangement as in claim 7 wherein 
each sort characteristic identifies a further network outside 
said communications network in which further network an 
associated communication arose. 

9. An arrangement as in claim 7 wherein said communi- 
cations network is a PSTN. 

10. An arrangement as in claim 7 wherein said data 
analyzer comprises means for storing data which cannot be 
amended or defaulted in a suspended data store, for potential 
subsequent processing. 

11. A data analyzer for use in a data processing arrange- 
ment according to claim 7. 

12. A data collection and processing arrangement for use 
in a first communication network which is connected to and 
receives communication instances from multiple further 
networks, the arrangement comprising: 
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a) registering means for registering a communication 
instance incoming to the first network having arisen in 
one of said further networks, 

b) means for formatting a record of said communication 
instance, the record comprising data identifying said 
one of the further networks and a parameter value 
associated with the communication instance, 

c) validating means for validating said record, 

d) pricing, and charging means for associating pricing and 
charging data with a validated record and providing a 
sorted array of priced, charged and validated records, 
the array being, sorted according, to the identities of the 
further networks, and 

e) analyzing means for analyzing records which are 
rejected by the validating means, the analyzing means 
dealing with the rejected records in one of at least three 
ways according to the cause of rejection, said three 
ways being: 

i) to set values in a non-validated record (NVR) to a 
best-fit value, 

ii) to set values in a NVR to default values; and 

iii) to archive or dump the NVR; 

records which have been dealt with in either of the ways i) 
or ii) being transmitted, directly or indirectly, to the pricing 
and charging means as validated records. 

13. An arrangement as in claim 12 wherein: 

a communication instance is received by an exchange of 

said first network, 
a record of the communication instance being transmitted 

to a data collector as said registering means, 
the data identifying said one of the further networks being 

provided by routing information incorporated in said 

record, and 

wherein the validating means has access to a routing 
reference data model and one of the criteria used in 
reference data model and one of the criteria used in 
validating a record is the degree of correlation between 
the routing information and the routing reference data 
model. 

14. An arrangement as in claim 12, wherein the analyzing 
means deals with rejected records in one of at least four 
ways, the four ways comprising i) to iii) and, iv), to append 
data concerning a NVR to a file in a suspended data store 
which can be accessed and analyzed at a later time. 

15. An arrangement as in claim 14 wherein each file in the 
suspended data store is dedicated to NVRs having the same 
error pattern. 

16. An arrangement as in claim 12 wherein the pricing and 
charging means comprises validating means, or access to 
validating means, and can output non-validated records to 
the analyzing means so as to allow representing of data 
which has become corrupted since first being validated in 
the arrangement. 

17. A data collection and processing system for use in 
collecting and processing communication records relevant 
to a plurality of networks, wherein said system comprises: 

at least one input for communication records generated at 
a point of connection between a first of said plurality of 
networks and at least one other of said plurality of 
networks, 

said records providing identification of the network in 
which an associated communication instance arose or 
from which it entered said first network, 

validation means for validating format and routing infor- 
mation aspects of the records, 
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data analyzing means for analyzing errored records 
rejected by said validation means, 

the analyzing means being capable of categorizing said 
errored records and applying default values to at least 
one category of the errored records, 

data sorting means for sorting validated and [defaulted 
records] errored records having default values applied 
thereto according to said network identification, and 

pricing means for receiving the sorted records and gen- 
erating billing information for use in billing entities 
relevant to the identified networks; and accumulation 
means for accumulating price data in respect of each 
network. 

18. A process for collecting and processing data in a first 
communications network, the first communications network 
comprising a plurality of switches in common ownership of 
a first party, the data concerning communications instances, 
wherein the network includes at least one point of connec- 
tion to a second communications network, the second com- 
munications network comprising a further plurality of 
switches in common ownership of another party, the process 
comprising the steps of: 

i) collecting data at a data access point at said point of 
connection, said data concerning a communication 
instance arising in an originating network other than 
said first network, and comprising route information 
identifying the originating network and at least one 
parameter measurement with respect to said commu- 
nication instance; 

ii) transmitting said data into a data processing system; 
and 

iii) processing said data[.]; 

wherein said data processing system includes a data 
analyzer, and said processing step includes validating 
the data followed by analzying invalid data, correcting 
said invalid data and processing said corrected invalid 
data as valid data. 

19. A process as in claim 18 wherein said first network 
comprises a public switched telephone network. 

20. A process as in claim 18 wherein said processing step 
comprises streaming said data according to the identity of 
said originating network. 

21. A process as in claim 18 wherein the first network 
comprises a communications network including both local 
exchanges and trunk exchanges and the data processing 
system includes correlating pricing and charging data from 
a database in accordance with the route information identi- 
fying the originating network. 

22. A process as in claim 21 wherein said correlation is 
carried out subsequent to streaming the data. 

23. Aprocess as in claim 18 wherein [said data processing 
system comprises a data analyzer and said processing step 
includes validating the data followed by analyzing invalid 
data,] the analysis [including] includes a step of identifying 
data which can potentially be set to a default value, and 
correcting said invalid data includes setting the data to a 
default value [and processing it as valid data]. 

24. A data processing arrangement for collecting and 
processing data in a first communications network, the first 
communications network comprising a plurality of switches 
in common ownership of a first party, the data concerning 
communications instances, wherein the network includes at 
least one point of connection to a second communications 
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network, the second communications network comprising a 
further plurality of switches in common ownership of 
another party, the arrangement comprising: 

i) a data input for inputting said data, said data comprising 
at least one of a plurality of sort characteristics; 

ii) verifying means for checking the data received at the 
data input; 

iii) a data analyzer for analyzing data rejected by the 
verifying means, and for substituting amended or 
default data therefor; 

iv) pricing means for pricing data output by the verifying 
means or by the data analyzer in accordance with 
updatable reference information; and 

v) output means for outputting priced data from the 
pricing means into memory locations, each memory 
location being dedicated to data relevant to one or more 
of said sort characteristics. 

25. A data processing arrangement as in claim 24 wherein 
each sort characteristics identifies a further network outside 
said communications network in which further network an 
associated communication arose. 

26. An arrangement as in claim 24 wherein said commu- 
nications network is a PSTN. 

27. An arrangement as in claim 24 wherein said data 
analyzer comprises means for storing data which cannot be 
amended or defaulted in a suspended data store, for potential 
subsequent processing. 

28. A data analyzer for use in a data processing arrange- 
ment according to claim 24. 

29. A data collection and processing system for collecting 
and processing data in a first communications network, the 
first communications network comprising a plurality of 
switches in common ownership of a first party, the data 
concerning communications instances, wherein the network 
includes at least one point of connection to a second com- 
munications network, the second communications network 
comprising a further plurality of switches in common own- 
ership of another party, the system comprising: 

at least one input for communication records generated at 
a point of connection between a first of said plurality of 
networks and at least one other of said plurality of 
networks, 

said records providing identification of the network in 
which an associated communication instance arose or 
from which it entered said first network, 

validation means for validating format and routing infor- 
mation aspects of the records, 

data analyzing means for analyzing errored records 
rejected by said validation means, 

the analyzing means being capable of categorizing said 
errored records and applying default values to at least 
one category of the errored records, 

data sorting means for sorting validated and [defaulted 
records] errored records having said default values 
applied thereto according to said network 
identification, and 

pricing means for receiving the sorted records and gen- 
erating billing information for use in billing entities 
relevant to the identified networks. 
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ABSTRACT 



This file server system appears to the host computer to 
be a plurality of data storage devices which are directly 
addressable by the host computer using the native data 
management and access structures of the host com- 
puter. The rile server however is an intelligent data 
storage subsystem that defines, manages and accesses 
synchronized sets of data and maintains these synchro- 
nized sets of data external from the host computer sys- 
tem's data management facilities in a manner that is 
completely transparent to the host computer. This is 
accomplished by the use of the snapshot application 
data group that extends the traditional sequential data 
set processing concept of generation data groups. 

42 Claims, 19 Drawing Sheets 
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FILE SERVER HAVING SNAPSHOT 
APPLICATION DATA GROUPS 

FIELD OF THE INVENTION 5 

This invention relates to file servers and, in particu- 
lar, to a file server system that creates and manages 
copies of data made external to a host processor. 

PROBLEM 10 

It is a problem in data storage subsystems to define, 
manage and access copies of data for a host processor. 
Intelligent data storage subsystems function to manage 
the data stored therein independent of the host proces- 15 
sor in order to relieve the host processor of the burden 
of attempting to track the physical location of each data 
record stored in the data storage subsystem. This prob- 
lem is further exacerbated by the creation of copies of 
data records in the data storage subsystem, especially 20 
when these copies are made independent of the host 
processor. 

Data records are typically copied for a number of 
reasons, such as disaster backup, test and development 
of software, multiple concurrent user access, etc. The 25 
copy of a data record represents the original data record 
as it existed at a predefined point in time. In backup 
applications, copies of data records are periodically 
made to ensure the integrity of the data records, since 3Q 
backup copies are made on a regular basis. The risk of 
data loss is thereby minimiz ed and restricted to the 
changes that have taken place since the last backup. A 
more complex problem is file concurrency wherein a 
plurality of users on a multi-user system each access a 35 
shared data record. The data storage subsystem typi- 
cally makes a copy of the original data record for each 
of these users and then must manage this plurality of 
copies and the changes made thereto in order to safe- 
guard the integrity of the original data record and prop- 40 
erly name and manage any new or varied data records 
created therefrom. 

The data record management problem is further com- 
plicated by the fact that in today's integrated applica- 
tions, data is often more than a single data set or data- 45 
base. Data is more typically a set of related data sets 
and/or databases which must be correctly synchronized 
in time to ensure consistency of the set of data records. 
If the various data sets and/or databases are not prop- 
erly synchronized, then the applications using the data 50 
will potentially make logical errors. Examples of this 
are the use of databases that create adjunct files consist- 
ing of indexing information, files consisting of report 
formats and user workspace. All of these diverse files ^ 
must be properly synchronized or the entirety of the 
data becomes unusable. 

Therefore, in data storage subsystems, copies of data 
records must be made accessible to the end user in a 
computing environment that may be homogeneous or ^ 
heterogeneous. This capability must be provided with- 
out requiring modifications to the host computers or 
their proprietary data access methods. However, there 
presently exists no data storage subsystem that can de- 
fine sets of data records and manage copies of them 65 
external to the host computer in a manner that is trans- 
parent to the host computer and ensures the integrity of 
the sets of data records. 



SOLUTION 

The above described problems are solved and a tech- 
nical advance achieved in the field by the file server 
system having snapshot application data groups. This 
file server system appears to the data processors) to be 
a plurality of data storage devices that are directly ad- 
dressable by each data processor using the native data 
management and access structures of the data proces- 
sor. The file server system -operates independent of the 
data processors) and can operate- in a data storage envi- 
ronment that includes heterogeneous data storage 
media as well as heterogeneous data processors. The file 
server system is an intelligent data storage subsystem 
that defines, manages and accesses synchrorngfid^sets of 
data and maintains these synchronized letTof data ex - 
ternal fro m the dat a pr ocessors' da ta management facil i- 
ties in a manner that is completely transparent to the 
aata processor (s;. The data storage and management 
capability can include changing the format of the data 
stored to accommodate various combinations of heter- 
ogenous data processors. 

This is accomplished by the use of the snapshot appli- 
cation data group that extends the traditional sequential 
data set processing concept of generation data groups. 
The snapshot application data groups allow the end user 
to define a set of data sets and/or databases that must be 
synchronized in time. The snapshot application data 
group then allows the end user to reference that set of 
data sets as a single entity for creation, access and dele- 
tion operations. It also provides a mechanism for man- 
aging resources consumed by the copies of the data that 
are created within the file server system. 

In one embodiment of the present invention, a disk 
array data storage subsystem is used to illustrate the 
concept of snapshot application data groups. The-disk 
array data storage subsystem comprises a plurality of 
small form factor disk drives that are interconnected 
into redundancy groups, each of which contain n+m 
disk drives for storing n segments of data and m redun- 
dancy segments in order to safeguard the integrity of 
the data stored therein. Each redundancy group func- 
tions as a large form factor disk drive which image is 
presented to the host processor. The disk array subsys- 
tem provides internal control hardware and software to 
map between the virtual device image as presented to 
the data processor and the physical devices on which 
the data records are stored. This mapping consists of 
tables of pointers that are addressable by the data stor- 
age subsystem using the virtual image presented to the 
data processor and which pointers denote the physical 
storage location in the plurality of disk drives in a se- 
lected redundancy group that contains the desired data 
record. 

This data storage subsystem includes a snapshot copy 
capability that creates copies of data records instanta- 
neously by simply replicating pointers. In particular, 
when a copy of a data record is to be made, a new 
pointer is created, addressable at the new virtual ad- 
dress assigned to this copy of the data record, which 
pointer points to the same physical storage location in 
the data storage subsystem as the original data record. 
Therefore, the data storage subsystem simply replicates 
the pointer from the mapping table that points to the 
original set of data and assigns a new virtual address to 
this replicated pointer to enable the data processor to 
access the original data record at two different virtual 
addresses. A physical copy of this data record is created 
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only when the data processor makes changes to this . FIG. 17 illustrates in flow diagram form the virtual 

data record or for ba ckup j urpns^ r>r if the server 1 track directory implementation of a data record copy 

. repositions the physical data to another media, e.g. for I operation; 

performance reasons. Alternatively, the user may spec- 1 FIG. 18 illustrates in flow diagram form the steps 

ify that the two copies must remain synchronized. In \ taken to update the virtual track directory when a vir- 

this case, the two different pointers will always refer- I tual track that was copied from another virtual track is 

ence the same physical data. This enables multiple user J written to the disk drives of a redundancy group; 

programs (potentially on different processors) to access I FIG. 19 illustrates in flow diagram form the opera- 

a common, single physical copy of data via different j tional steps taken in a data processor file server read and 

logical copies. 10 write interaction; and 

The snapshot application data group makes use of this FIG. 20 illustrates in flow diagram form the opera- 
capability or an equivalent network functionality to tional steps taken by the system to create a snapshot 
make a copy of a set of data sets that are synchronized copy of an application data group. 

at a particular instant in time. This new copy is an in- „ *~„ mww ^*^ ™ , „ T _ 

stancVof a snapshot application data group generation, 15 ^TAILED DESCRIPTION OF THE DRAWING 
The set of data sets may constitute a portion of a single FIG. 1 illustrates in block diagram form the overall 
volume or it may consist of one or more volumes whose architecture of the file server system 1 of the present 
data needs to be synchronized for recovery and data invention. The file server system 1 is connected to at 
processor access purposes. This copy of the set of data least one data processor 2 by a data channel 8 which 
sets is not accessible via the data processor's normal 20 functions to exchange data and control information 
access methods, because the data processor retains between the data processor 2 and the file server system 
knowledge of only the original source set of data sets on 1. Within the data processor 2 is resident an operating 
the original source virtual volumes. The file server system 4 as well as a plurality of user programs 3. In 
system snapshot application data group manager retains addition, a catalog structure 5 is provided either in the 
the knowledge of the copied volumes, the user designs- 25 data processor 2 or resident on a memory device 14 
tion for these copied volumes and specific access infor- connected to the data processor 2. The catalog struc- 
mation, including the mapping information indicative of ture 5 consists of data that is used to map between the 
the physical storage location on the disk drives in a virtual address of a particular data set (also referred to 
redundancy group wherein the data is stored. The as data record) and the physical location on a data stor- 
media used to store the data can be a disk array or any 30 age device on which the data record is stored. The file 
other media or combinations of media such as a disk server system 1 itself consists of a plurality of data stor- 
array in combination with a backend automated mag- age devices 11-* that are used to store the data records 
netic tape cartridge library system, including a plurality for access by the data processor 2. As can be seen from 
of tape drives such that the file server system comprises FIG. 1, the data storage devices 11-* in the file server 
a hierarchical data storage system containing multiple 35 system 1 are divided into two groups: data storage de- 
types of media. vices 12 within the functional address space of the data 

BRIEF DESCRIPTION OF THE DRAWING Z^^^Zt^^ tll^ 

FIG. 1 illustrates in block diagram form the overall ticular, a data processor 2 can directly address N func- 
architecture of a file server having snapshot application 40 tional volumes (11-1 to 11-N) of data storage capacity 
data groups; while the file server system 1 can be equipped with M 

FIG. 2 illustrates an overall block diagram of a disk functional volumes 11-* of data storage capacity where 
array data storage subsystem; M>N. Therefore, the directly addressable memory 

FIG. 3 discloses additional details of the storage con- space available to the data processor 2 is typically less 
trol unit of the disk array data storage subsystem; 45 than that provided by the file server system 1. Further- 

FIG. 4 illustrates the format of a copy table; more, the directly addressable functional volumes (11-1 

FIG. 5 illustrates in flow diagram form copy table to 11-N) in the file server system 1 must include a plu- 
implementation of a data record copy operation; rality of functional volumes that are used as scratch 

FIGS, 6 and 7 illustrate, in flow diagram form, the functional volumes to enable the user programs 3 on the 
operational steps taken to perform a data read opera- 50 data processor 2 to read and write data thereon. There- 
tion; fore, the memory address space seen by the data proces- 

FIG. 8 illustrates a typical free space directory used sor 2 consists of only a fraction of the memory capabil- 
in the data storage subsystem; ity of the file server system 1. The additional capability 

FIG. 9 illustrates the format of the virtual track direc- of the file server system 1 can therefore be used to store 
tory; 55 backup copies of the data that is directly addressable by 

FIGS. 10 and 11 illustrates, in flow diagram form, the the data processor 2. 
basic and enhanced free space collection processes, File server system 1 operates independent of data 
respectively; processor 2 to manage the storage of data records exter- 

FIG. 12 illustrates the format of the Logical Cylinder nal to data processor 2 and in a data storage media 
Directory; 60 environment that can be heterogeneous. Multiple types 

FIG. 13 illustrates, in flow diagram form, the opera- of media can be included within or connected to file 
tional steps taken to perform a data write operation; server system 1: rotating media data storage devices 

FIG. 14 illustrates a typical free space directory en- 11-*, mountable media data storage devices 10 (mag- 
try; netic tape), or other file server systems 9. File server 

FIG. 15 illustrates, in flow diagram form, the migrate 65 system 1 dynamically maps the data records identified 
logical cylinder to secondary media process; by data processor 2 into physical storage locations on 

FIG. 16 illustrates additional details of an import/ex- the data storage media within file server system 1, This 
port control unit interface; mapping can include compression of the data received 
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from data processor 2 and even changing the format of transmits the channel command over the data link 8 to 
the data for storage on the data storage media. The the file server system 1 to obtain the data records stored 
dynamic mapping of the data processor 2 address space therein. At step 1906, the file server system 1 decodes 
is accomplished by either modifying operating system 4 the received channel command to determine the nature 
to directly manage the data storage, or by including 5 of the command and the physical location of the re- 
utilities on data processor 2 to correct meta data used to quested data record. Since this is a data record read 
access the data, such as the volume table of contents, or request, the file server system 1 identifies the functional 
by including a file server system utility 6 on data proces- volume that is identified by the data processor channel 
sor 2 to intercept the data processor input/output com- command and uses the information contained in the 
mands and managing the input/output in file server 10 channel command to either directly locate the re- 
system 1. The file server system utility 6 is the preferred quested data record or access further catalog informa- 
embodiment and the result of this structure is to present tion stored in the file server system 1 to identify the 
a uniform data storage volume image to data processor physical storage location in the file server system 1 that 

2 in a manner that is transparent to data processor 2. contains the requested data record. The file server sys- 
The image can be that of "mountable DASD" wherein IS tern 1 retrieves the requested data record from its identi- 
file server system 1 provides data processor 2 with Red physical storage location and transmits the data and 
access to data storage capacity that exceeds the address a response channel command over the data channel 8 to 
capacity of data processor 2. In addition, file server the data processor 2 at step 1908. At step 1909, the 
system 1 can store data on backend media and then operating system 4, in response to the data transmitted 
stage data requested by data processor 2 to cache mem- 20 by the file server system 1, returns the data to the user 
ory or to data storage media directly addressable by program 3 for use therein. 

data processor 2. It is obvious in this high level functional description 

Overview of Data Read/Write Operations that the generic process described herein is presently in 

In order to better understand the operation of this use but suffers from the limitation that the data proces- 

apparatus, FIG. 19 illustrates in flow diagram form the 25 sor 2 must manage all of the data records that are stored 

operational steps taken by the apparatus illustrated in on the file server system 1. This represents a processing 

FIG. 1 to read and write data from the data processor 2 burden on the host processor 2 and significantly limits 

on the file server system 1. At step 1901, a user program the amount of memory that can be accessed by the data 

3 resident on data processor 2 issues a read or a write processor 2 and used to store the data records that are 
command. The operating system 4 at step 1902 receives 30 accessible by the user programs 3 resident on the data 
the request from the user program 3 and interprets the processor 2. 

request to determine what action is required to satisfy ( £s Generation Data Groups 

this request At step 1903, the operating system 4 deter- As can be seen from the architecture of FIG. 1, the 

mines the virtual address, e.g. the volume serial number address space within the file server system 1 is of 

on which they reside, of the data records that are re- 35 greater extent than that directly addressable by the data 

quested by the user program 3. The virtual address is processor 2. As noted above, the data processor 2 can 

obtained from the catalog structure 5. The catalog directly address N functional volumes while the file 

structure 5 can consist of multiple levels of information server system 1 consists of a memory capacity of M 

5c that translate the virtual address provided by the user functional volumes where M>N. The use of a file 

program 3 into an identification of a device or a tunc- 40 server system 1 that operates independent of the data 

tional volume within the file server system 1 that con- processor 2 to manage the data records stored therein is 

tains the requested data record. Additional physical of significant benefit to the capability of the data proces- 

location identification information Sb may be stored in sor 2 especially where the file server system 1 automati- 

the catalog 5 and maintained by file server system utility cally manages backup^jjat^ for the active functional 

6 to indicate a particular track and cylinder on the data 45 volumes (11-1 to 11-N) that are used by the data proces- 

storage device that contains the data record that has sor 2. The user programs 3 on the data processor 2 * 

been requested by the user program 3. Alternatively, typically make use not of a single data set but sets of 

the data storage device itself may contain a level of data sets wherein a plurality of data sets or data records 

catalog information that specifically points to the physi- must be c oncurrently managed in order to ensure th e 

cal location on the data storage device of the requested 50 integrity of the data contained therein. An example of —] 

data record. The physical location is determined by the such an application is the use of a database wherein the J 

operating system 4 at step 1904 by translating the virtual raw data can be stored in one functional volume 11-2 in I 

address of the requested data record into a physical the file server system 1 while the indexing information / 

location indicia. to cross reference the data that is stored in the database I 

In either case, at step 1905, the operating system 4 55 system can stored on yet another functional volume 

builds a channel command, or other network request for 11-3 in the file server system 1. Furthermore, prefor- 

data, which consists of a command of predetermined matted standard report programs can also be stored on 

format that is used to indicate to the file server system the file server system 1 on different functional volumes 

1 the nature of the request from the data processor 2 and 11-1 therein. Therefore, it is obvious the sets of data sets 

specific data relating to this request that enable the file 60 can be scattered throughout many functional volumes 

server system 1 to satisfy the request. In the case of a in the file server system 1 and all of these data sets in the 

data record read command, the channel command is set of data sets must be temporally concurrent in order 

identified through its data content as being a data record to prevent the corruption of the data that is stored 

read request and additional information is provided in therein. 

the channel command to identify the virtual address and 65 A well known functional capability presently found 

the specific physical location of the requested data re- ^ m computer systems in the generation data group. The 

cord. At step 1906, the operating system 4 issues the generation data group is a data management methodol- 

channel command to the channel controller 7 which ogy that automatically maintains a predetermined num- 
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ber of most recent versions of a data record. The most 
recent version of a data record is generally assigned a 
relative version number of 0 while the last most recent 
version of the same data record is assigned the relative 
version number- 1 . It can be seen from this methodology 5 
that a user can define the number of vintages of data 
that is to be saved on the file server system by simply 
defining the maximum relative version number that the 
file server system 1 can store. By setting this threshold 
number, the file server system 1 can automatically dis- 10 
card the oldest version of this data record when a new 
version is created by a user program 3 on the data pro- 
cessor 2. This enables the file server system 1 to auto- 
matically maintain a predetermined number of prior 
copies of the data record. Furthermore, the file server-yl5 
system 1 caikxm^m^ tl CA ^ v ^^^^V T ^^ min ^^ r ^ 
ve raoivof the data recorjd r when^rnewj^ersipn 
6^^^S^is^rlate1cl?^^efore, an example where the 
file server system 1 maintains three versions of a data 
record, when the user program 3 on the data processor 20 
2 creates a new version of the data record, the oldest or 
fourth version of the data record that presently is stored 
on the file server system 1 can bejr^^mii jt g<i» to »aii 



ar^Kejmemc^jtha^ ist p^ 1 
oran adjunct systemconnected thereto for automate d 
archiving while the file server system l upflajfis, the. 



version number of tne remamlMg three copies, 
d ata record stored therein and assigns the new v ersio n 
o f the data record received from the data processo r 2- 
the relative version 0 tag. In order lor tne tile server *30 
system 1 to most expeditiously manage the plurality of 
versions of data rccord st hat is stored therein ra^snap- 
shot cqex. capability isHSIized to quickly make copies 
of data records and to dynamically move the data re- 
cords, using a pointer redirection scheme that is de- 35 
scribed in detail below. Furthermore, when a data re- 
cord on a functional volume is to be replicated, the 
address mapping must also be replicated so that the 
snapshot volume is identical to the original functional 
volume that is directly addressable by the d ata proces - 40 
sor 2. 

The file server system 1 operates independent of the ^7 . 
data processor 2 and directly manages all of the data 
records stored therein. By making use of data record 
pointer information to manage the placement of data ft5 
records in the copying of these data recdrds, the file 
server system 1 can quickly transfer files from the func- 
tional address space accessible by the data processor 2 
to the functional address space that is outside the ad- 
dressing range of the data processor 2 by simply, manag- 50 
ing the pointers that reference the physical locations on 
the data storage devices 11-* contained in the file server 
system 1. Therefore, without having to physically relo- 
cate the data record on the devices contained in the file 
server system 1, the file server system 1 can effectively 
accomplish an instantaneous .move or copy of a data 
recordjn.a manner that appears to the data processor 2 



tcTbe a physical movement o^repUcation of the data 
- ff^o r "trom one phvsical°^evTce to^btSer ^rivsica r 
device even though the data record is not moved and 
the pointers thereto are simply managed by the file 
server system 1. The file server system 1 maintains a 
dynamic mapping that is transparent to the data proces- 
sor 2 and redirects a data record to a physical memory 
location on a data storage device contained within the 
file server system 1 not as a function of host processor 
commands but as a function of the mapping memory 
contained within the file server system 1. The data v 
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processor 2 is totally unaware of the mapping and data 
management that takes place within the file server sys- 
tem and this obviates the need for the data processor 2 
to attempt to manage the data records that are stored on 
the file server system 1. 

The preferred embodiment of the file server system 1 
disclosed herein below is that of a disk array data stor- 
age subsystem 100 illustrated in FIG. 2. The disk array 
data storage subsystem 100 includes an import/export 
control unit 208-* that interconnects data storage sub- 
system 100 with additional data storage devices. In the 
embodiment illustrated in FIG. 2, data storage subsys- 
tem 100 has connected thereto a archive memory 10 
which in the preferred embodiment is disclosed as a 
tape drive subsystem that can be an automated magnetic 
tape cartridge library system to store and retrieve a 
large number of magnetic tape cartridges 10a, 10b. It is 
obvious that the specifics of this implementation are but 
one embodiment that can be used to provide the file 
server system capability as generally described herein. 
The disk array data storage subsystem 100 is selecte d 
f orlne pr ef erred emboaiinent aue to its dvnainicTmab.- 
ping of Virtual devices wherein the virtual device i mage 
presented to the data processor 2 is simply an emulation 
and the data is stored on physical devices in a manner 
known only to the data storage subsystem 100 rather 
than the data processor 2. 

Data Storage Subsystem Architecture 

FIG. 2 illustrates in block diagram form the architec- 
ture of the preferred embodiment of the data storage 
subsystem 1, including disk drive array data storage 
subsystem 100. The disk drive array data storage sub- 
system 100 appears to the associated host processors 
11-12 to be a collection of storage devices, such as large 
form factor disk drives with their associated storage 
control, since the architecture of disk drive array data 
storage subsystem 100 is transparent to the associated 
data processors 2 - 2'. This disk drive array data storage 
subsystem 100 includes a plurali ty of disk drive s (ex 
122-1 to 125-r) located in a plurality of disk drive sub- 
sets 103-1 to 103-i. The disk drives 122-1 to 125-r are 
significantly less expensive, even while providing disk 
drives to store redundancy information and providing 
disk drives for spare purposes, than the typical 14 inch 
form factor disk drive with an associated backup disk 
drive. The plurality of disk drives 122-1 to 125-r are 
typically the commodity hard disk drives in the 5J inch 
form factor. 

The architecture illustrated in FIG. 2 is that of a 
plurality of data processors 2-2' interconnected via the 
respective plurality of data channels 21, 22-31, 32, re- 
spectively to a data storage subsystem 100 that provides 
the backend data storage capacity for the data proces- 
sors 2 - 2'. This basic configuration is well known in the 
data processing art. The data storage subsystem 100 
includes a control unit 101 that serves to interconnect 
the subsets of disk drives 103-1 to 103-i and their associ- 
ated drive managers 102-1 to 102-i with the data chan- 
nels 21 - 22, 31 - 32 that interconnect data storage sub- 
system 100 with the plurality of data processors 2 - 2'. 

Control unit 101 includes typically two cluster con- 
trols 111, 112 for redundancy purposes. Within a cluster 
control 111 the multipath storage director 110-0 pro- 
vides a hardware interface to interconnect data chan- 
nels 21, 31 to cluster control 111 contained in control 
unit 101. In this respect, the multipath storage director 
110-0 provides a hardware interface to the associated 
data channels 21, 31 and provides a multiplex function 
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to enable any attached data channel ex-21 from any data drive. Data reconstruction is accomplished by the use*^ 
processor 2 to interconnect to a selected cluster control of the M redundancy segments, so that the data stored 
111 within control unit 101. The cluster control 111 on the remaining functioning disk drives combined with 
itself provides a pair of storage paths 201-0, 201-1 which the redundancy information stored in the redundancy 
function as an interface to a plurality of optical fiber 5 segments can be used by control software in control 
backend channels 104. In addition, the cluster control unit 101 to reconstruct the d afe lost A yhen one or more 
111 includes a data compression function as well as a of the plurality of disk drives in the redundancy group 
data routing function that enables cluster control 111 to fails (122-1 to 122-n+m). This arrangement provides a 
direct the transfer of data between a selected data chan- reliability capability similar to that obtained by disk 
nel 21 and cache memory 113, and between cache mem- 10 shadowing arrangements at-a significantly reduced cost 
ory 113 and one of the connected optical fiber backend over such an arrangement. - 
channels 104. Control unit 101 provides the major data £\ Disk Drive 

storage subsystem control functions that include the ^ Each of the disk drives 122-1 to 125-r in disk drive 
creation and regulation of data redundancy groups, subset 103-1 can be considered a disk subsystem that 
reconstruction of data for a failed disk drive, switching 15 congis ts'of a disk drive "mechanism an d.its.suno uncfing 
a spare disk drive in place of a failed disk drive, data Control and interface "circuitry . The disk drive consists 
redundancy generation, logical device space manage- of a commodity disk drive whi c^s_a j _c^m raerciali y_ 
ment, and virtual to logical device mapping. These avallab^rIaTd~ciisK_Qrive,of the type Jth ajLtypically--4s 
subsystem functions are discussed in further detail be- usedln personal com puters. A control processor associ- 
low. 20 ateTwith the disk drive K55 control responsibility for 

Disk drive manager 102-1 interconnects the plurality the entire disk drive and monitors all information routed 
of commodity disk drives 122-1 to 125-r included in disk over the various serial data channels that connect each 
drive subset 103-1 with the plurality of optical fiber disk drive 122-1 to 125-r to control and drive circuits 
backend channels 104. Disk drive manager 102-1 in- 121. Any data transmitted to the disk drive over these 
eludes an input/output circuit 120 that provides a hard- 25 channels is stored in a corresponding interface buffer 
ware interface to interconnect the optical fiber backend which is connected via an associated serial data channel 
channels 104 with the data paths 126 that serve control to a corresponding serial/parallel converter circuit. A" 
and drive circuits 121. Control and drive circuits 121 disk controller is also provided in each disk drive to 
receive the data on conductors 126 from input/output implement the low level electrical interface required by 
circuit 120 and convert the form and format of these 30 the commodity disk drive. The commodity disk drive 
signals as required by the associated commodity disk has an interface which must be interfaced with control 
drives in disk drive subset 103-1. In addition, control and drive circuits 121. The disk controller provides this 
and drive circuits 121 provide a control signalling inter- function. Disk controller provides serialization and 
face to transfer signals between the disk drive subset deserialization of data, CRC/ECC generation, checking 
103-1 and control unit 101. 35 and correction and NRZ data encoding. The addressing 

^ | The data that is written onto the disk drives in disk information such as the head select and other type of 
'drive subset 103-1 consists of data that is transmitted control signals are provided by control and drive cir- 
from an associated data processor 2 over data channel cuits 121 to commodity disk drive 122-1. This communi- 
21 to one of cluster controls 111, 112 in control unit 101. cation path is also provided for diagnostic and control 
The data is written into, for example, cluster control 111 40 purposes. For example, control and drive circuits 121 
which stores the data in cache 113. Cluster control 111 can power a commodity disk drive down when the disk 
stores N physical tracks of data in cache 113 and then drive is in the standby mode. In this fashion, commodity 
generates M redundancy segments for error correction disk drive remains in an idle state until it is selected by 
purposes. Cluster control 111 then selects a subset of control and drive circuits 121. 
disk drives (122-1 to 122-n-f m) to form a redundancy 45 Control Unit 

group to store the received data. Cluster control 111 FIG. 3 illustrates in block diagram form additional 
selects an empty logical track, consisting of N+M details of cluster control 111. Multipath storage director 
physical tracks, in the selected redundancy group. Each 110 includes a plurality of channel interface units 201-0 
of the N physical tracks of the data are written onto one to 201-7, each of which terminates a corresponding pair 
of N disk drives in the selected data redundancy group. 50 of data channels 21, 31. The control and data signals 
An additional M disk drives are used in the redundancy received by the corresponding channel interface unit 
group to store the M redundancy segments. The M 201-0 are output on either of the corresponding control 
redundancy segments include error correction charac- and data buses 206-C, 206-D, or 207-C, 207-D, respec- 
ters and data that can be used to verify the integrity of tively, to either storage path 200-0 or storage path 
the N physical tracks that are stored on the N disk 55 200-1. Thus, as can be seen from the structure of the 
drives as well as to reconstruct one or more of the N cluster control 111 illustrated in FIG. 3, there is a signif- 
physical tracks of the data if that physical track were icant amount of symmetry contained therein. Storage 
lost due to a failure of the disk drive on which that path 200-0 is identical to storage path 200-1 and only 
physica] track is stored. one of these is described herein. The multipath storage 

Thus, data storage subsystem 100 can emulate one or 60 director 110 uses two sets of data and control busses 
more storage devices, e.g. large form factor disk drives 206-D, C and 207-D, C to interconnect each channel 
(ex — an IBM 3380 K type of disk drive), using a plural- interface unit 201-0 to 201-7 with both storage path 
ity of smaller form factor disk drives while providing a 200-0 and 200-1 so that the corresponding data channel 
high system reliability capability by writing the data 21 from the associated data processor 2 can be switched 
across a plurality of the smaller form factor disk drives. 65 via either storage path 200-0 or 200-1 to the plurality of 
A reliability improvement is also obtained by providing optical fiber backend channels 104. Within storage path 
a pool of R spare disk drives (125-1 to 125-r) that are 200-0 is contained a processor 204-0 that regulates the 
switchably interconnectable in place of a failed disk operation of storage path 200-0. In addition, an optical 
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device interface 205-0 is provided to convert between 
the optical fiber signalling format of optical fiber back- 
end channels 104 and the metallic conductors contained 
within storage path 200-0. Channel interface control 
202-0 operates under control of processor 204-0 to con- 
trol the flow of data to and from cache memory 113 and 
the one of channel interface units 201 that is presently 
active within storage path 200-0. The channel interface 
control 202-0 includes a cyclic redundancy check 
(CRQ generator/checker to generate and check the 
CRC bytes for the received data. The channel interface" ' 
circuit 202-0 also includes a buffer that compensates for 
speed mismatch between the data transmission rate of 
the data channel 21 and the available data transfer capa- 
bility of the cache memory 113. The data that is re- 
ceived by the channel interface control circuit 202-0 
from a corresponding channel interface circuit 201 is 
forwarded to the cache memory 113 via channel data 
compression circuit 203-0. The channel data compres- 
sion circuit 203-0 provides the necessary hardware and 
microcode to perform compression of the channel data 
for the control unit 101 on a data write from the data 
processor 2. It also performs the necessary decompres- 
sion operation for control unit 101 on a data read opera- 
tion by the data processor 2. ~* 

As can be seen from the architecture illustrated in^\ 
FIG. 2, all data transfers between a data processor 2 and I 
a redundancy group in the disk drive subsets 103 are I 
routed through cache memory 113. Control of cache I 
memory 113 is provided in control unit 101 by proces- SO 
sor 204-0. The functions provided by processor 204-0 
include initialization of the cache directory and other 
cache data structures, cache directory searching and 
management, cache space management, cache perfor- 
mance improvement algorithms as well as other cache 
control functions. In addition, processor 204-0 creates 
the redundancy groups from the disk drives in disk 
drive subsets 103 and maintains records of the status of 
those devices. Processor 204-0 also causes the redun- 
dancy data across the N data disks in a redundancy 
group to be generated within cache memory 113 and 
writes the M segments of redundancy data onto the M 
redundancy disks in the redundancy group. The func- 
tional software in processor 204-0 also manages the 
mappings from virtual to logical and from logical to 
physical devices, The tables that describe thil mapping 



than others. Specifically, the eight bit ASCII represen- 
tations used to encode characters are not all used in a 
typical character string. Nearly three fourths of the 
possible 256 characters representations may not be used 
in a specific file. Consequently, nearly two bits of each 
eight bit representation could be removed without af- 
fecting the data content of the character string. This is 
a twenty-five percent savings in space and encoding 
time. The second type of redundancy is character repe- 
tition redundancy. When a string of repetitions of a 
single character occurs, the -message can be encoded 
more compactly than by just repeating the character 
symbol. The character repetition strings are infrequent 
in text but fairly common in formatted business flies 
where unused space is very common. In addition, 
graphical images contain a significant amount of char- 
acter repetition redundancy. A third form of redun- 
dancy consists of high usage patterns. Certain sequences 
of characters appear with relatively high frequency in a 
particular data file and therefore can be represented 
with relatively fewer bits for a net savings in data stor- 
age space and string encoding time. Thus, frequently 
occurring patterns are encoded using fewer bits while 
infrequently occurring patterns are encoded using more 
'25 bits. The fourth type of redundancy is positional redun- 
dancy. If certain characters appear consistently at a 
predictable place in each block of data, then the charac- 
ters are at least partially redundant. An example of 
positional redundancy are charts and pictures. 

The most popular method of data compression is 
Huffman type coding which translates fixed sized pieces 
of input data into variable length symbols. The Huffman 
encoding procedure assigns codes to input symbols such 
that each code length is proportional to the probability 
of the symbol occurring in the data. In normal use, the 
size of the input symbols is limited by the size of the 
translation table needed for compression. That is, a table 
is needed that lists each input symbol and its corre- 
sponding code. A second problem with Huffman en- 
40 coding is the complexity of the decompression process. 
The length of each code to be interpreted for decom- 
pression is not known until the first few bits are inter- 
preted. An improvement over Huffman coding is an 
adaptive compression algorithm such as the Lempel- 
Ziv category of algorithms that converts variable 
length strings of input symbols into fixed length codes. 
This form of data compression is effective at exploiting 
character frequency redundancy, character repetition 
redundancy, and high usage pattern redundancy but is 
not generally effective on positional redundancy. This 
algorithm is adaptive in the sense that it starts each field 
with an empty table of symbol strings and builds the 
table during both the compression and decompression 
processes. These are one PASS procedures that require 
no prior information about the input data statistics and 
execute in time proportional to the length of the mes- 
sage. 

The length of a compressed image for a given mes- 



35 



^5 



50 



55 



are updated, maintained, bacl;ed_up-and-occasionally 
fecovered by this f unctional software^on processo r 
204-0. The free space collection function is also per- 
formed by processor 204-0 as well as management and 
scheduling of the optical fiber backend channels 104. 
Many of these above functions are well known in the 
data processing art and are not described in any detail 
herein. 

Data Compression Capabilities 
Data stored on disks and tapes or transferred over 
communication links in a computer system generally 
contains significant redundancy. Data compression al- ^ 

gorithms improve the efficiency with which data is V sage is unpredictable because it depends on the content 
stored or transmitted by reducing the amount of redun- 60 of the message. There is no assurance prior to data 

compression that a message will compress at all; in some 
cases it may even expand. Therefore, the space allo- 
cated for the compressed image must be at least as big as 
the space allocated for the original message. In addition, 
an update to a data record that alters just a few charac- 
ters oFthe data record can change the compressed size 
and may result in a required change in allocation even 
for a minor update. Therefore, the above-described 



dant data. A compression algorithm takes a source text 
as input an produces a corresponding compressed text. 
An expansion algorithm takes the compressed text as 
input and produces the original source text as an output. 
There are four types of redundancy that are typically 65 
found in a data file. The first type of redundancy is 
character distribution redundancy. In a typical charac- 
ter string, some characters are used more frequently 
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process used by data storage subsystem 100 to perform next previously undefined reference value. Thus, strings 
modifications to data records overcomes this minor become longer and data compression more efficient as 
update impact on compressed data, since a modified more user data bytes in the segment are examined, 
data record is always written to an empty logical cylin- However, as more strings are defined, greater length 
der and the old version of the data record is flagged as 5 reference values are needed to uniquely identify a 
obsolete. string, reducing the efficiency of the compression pro- 
Data Compaction cess. This factor makes it desirable to divide a data 
As a data record is received from data processor 2 by record into segments which are compressed indepen- 
channel interface control 202-0, and buffered therein, dently. String definition occurs by combining the last 
processor 204-0 deletes all gaps between fields in the 10 used reference value with the next user data byte from 
received count key data record. The virtual device and the input data stream, then searching to see if this string 
virtual cylinder addresses are extracted from the count has been previously defined. If it has, the next byte is 
key data format data record and used to create an entry concatenated to this new byte string reference value 
in the virtual cylinder directory stored in cache mem- and a search is again conducted to see if this extended 
ory 113. The data fields of the received data record are 15 byte string has been previously defined as a string. This 
forwarded to channel data compression circuit 203-0 for sequence is continued until a string is located that has 
compression and temporary storage in cache memory not been previously defined. The last used defined 
113. Thus, all that is stored in the redundancy groups string reference value is put in the compressed output 
are logical cylinders of compressed data in fixed block data stream and the next previously undefined reference 
architecture format since the headers, gaps and received 20 value is assigned to define the last string that was not 
space in the received count key data are deleted. A found. The search procedure is initiated over again 
further compaction process is the creation of null vir- starting with the most recently used user data byte, 
tual tracks. Each time data processor 2 creates a new Runs of three or more repeated bytes are encoded 
instance of a data file, a predetermined data file extent is using a predetermined set of reserved reference values 
reserved by data processor 2. Channel interface control 25 to indicate that the preceding character was repeated 
202-0 and processor 204-0 eliminate the need to reserve the number of times specified by the repeat code. The 
this unused memory space by simply creating a series of immediately preceding character is not re-included in 
null entries in the virtual track directory; no data is the repeat count. Rim length encoding takes precedence 
written to a redundancy group. over string data compression. Run length encoding, 
Adaptive Data Compression Function 30 single byte compression, and string data compression 
* " ' The adaptive data compression apparatus 203-0 is are intermixed in any compressed segment within the 
located within control unit 101* which is interposed user data record. 

between a plurality of host processor channel interface If the size of a compressed segment in bytes is larger 

units 201 and cache memory 113. The adaptive data than its size before compression then the segment is not 

compression apparatus 203-0 functions to efficiently 35 compressed and is recorded in uncompressed format 
compress the records of a user data file received from Import/Export Control Unit Interface 
the data processor 2 into a bit oriented compressed The file server system 1 can include several types of 

format for storage in cache memory 113 and disk drives media and these can be disk drives 12*-* of differing 

122. The data compression apparatus 203-0 divides each format and capacity or even different media, such as 

record of an incoming streargLoLdata* records into pre- 40 tape 10, connected to datastorage subsystem 100 via an 

determined sized segments, each of which is com- import/export interface unit 208-*. Data records can be 

pressed independently without reference to any other migrated or copied, either automatically within file 

segment in the stream of data records. The data com- server system 1 or by command from data processor 2 

pression apparatus 203-0 concurrently uses a plurality of between data storage subsystem 100, a similar subsys- 

data compression algorithms to adapt the data compres- 45 tern at another location, and/or tape drive control unit 

sion operation to the particular data stored in the user 10. In the particular embodiment disclosed herein, a 

data record. A cyclic redundancy check circuit is used tape drive control unit 10 is described, but it is under- 

to compute a predetermined length CRC code from all stood that import/export interface unit 208- * can be 

of the incoming user data bytes before they are com- connected to any data storage media or to a network 

pressed. The computed CRC code is appended to the 50 that can be used to access data storage media. This data 

end of the compressed data block. storage media can be collocated with data storage sub- 

The data compression apparatus 203*0 operates by system 100 or can be spatially disjunct therefrom as in 

converting bytes and strings of bytes into shorter bit the case of a geographically distant secure archive facil- 

string codes called reference values. The reference val- ity. 

ues replace the bytes and strings of bytes when re- 55 FIG. 16 illustrates in block diagram form additional 

corded on the disk drives 122. The byte strings have details of the tape drive control unit interface 208-1 

two forms, a run length form for characters that are which is connected via data channel 20 to tape drive 

repeated three or more times, and a string form that control unit 10 which interconnects the data channel 20 

recognizes character patterns of two or more charac- with a plurality of tape drives. Tape drive control unit 

ters. Two variables are used to indicate the maximum 60 interface 208 is similar in structure to a data channel 

and minimum byte values in a particular segment. interface circuit 201 and functions like a host channel 

Strings of two or more bytes are compressed by as- interface so that the tape drive control unit 10 believes 

signing a reference value to each defined string using an that data channel 20 is a normal IBM OEMI type chan- 

adaptive data compression algorithm. Subsequent oc- nel. FIG. 16 illustrates the master sequence control 1601 

currences of that string are replaced by its string refer- 65 which is the main functional control of the tape drive 

ence value. Strings are constructed a character at a control unit interface circuit 208. All other control 

time, where a previously defined string plus the next function in the tape drive control unit interface circuit 

user data byte defines a new string and is assigned the 208 are slaves to the master sequence control circuit 
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1601. Master sequence control 1601 recognizes and disk drive array data storage subsystem 100 transparent 
responds to sequences of events that occur on the data to the data processors (2 - 2'). 

channel 20 for those initiated by elements within con- A redundancy group consists of N+M disk drives, 
trol cluster 111. Master sequence control 1601 contains The redundancy group is also called a logical volume or 
a microsequencer, instruction memory, bus source and 5 a logical device. Within each logical device there are a 
destination decode registers and various other registers plurality of logical tracks, each of which is the set of all 
as are well known in the art. A plurality of bus input physical tracks in the redundancy group which have the 
receivers 1603 and bus output drivers 1602 and tag same physical track address. These logical tracks are 
receivers 1604 and drivers 1605 are provided to transmit also organized into logical cylinders, each of which is 
tag or bus signals to the tape drive control unit 10. 10 the collection of all logical -tracks within a redundancy 
These transmitters and receivers conform to the re- group which can be accessed -at a common logical ac- 
quirements set in the IBM OEMI specification so that ator position. Disk drive array data storage subsystem 
normal IBM channels can be used to connect data stor- 100 appears to the host processor to be a collection of 
age subsystem 100 with a conventional tape drive con- large form factor disk drives, each of which contains a 
trol unit 10. The details of these drivers and receivers 15 predetermined number of tracks of a predetermined size 
are well known in the art and are not disclosed in any called a virtual track. Therefore, when the data proces- 
detail herein. Control signals and data from processor sor 2 transmits data over the data channel 21 to the data 
204 in cluster control 111 are received in the tape drive storage subsystem 100, the data is transmitted in the 
control unit interface 208-1 through the control bus form of the individual records of a virtual track. In 
interface 1606 which includes a plurality of drivers and 20 order to render the operation of the disk drive array 
receivers 1607, 1608 and an interface adapter 1609 data storage subsystem 100 transparent to the data pro- 
which contains FIFOs to buffer the data transmitted cessor 2, the received data is stored on the actual physi- 
between the main bus of the tape drive control unit cal disk drives (122-1 to 122-n+m) in the form of virtual 
interface circuit 208-1 and data and control busses 206- track instances which reflect the capacity of a track on 
D t 206-C, respectively. Furthermore, automatic data 25 the large form factor disk drive that is emulated by data 
transfer interface 1610 is used to transfer data between storage subsystem 100. Although a virtual track in- 
the tape interface drivers and receivers 1602, 1603 and stance may spill over from one physical track to the 
cache memory 113 on bus CH ADT via receivers and next physical track, a virtual track instance is not per- 
transmitters 1611, 1612. Thus, the function of tape drive mitted to spill over from one logical cylinder to an- 
control unit interface circuit 208-1 is similar to that of 30 other. This is done in order to simplify the management 
channel interface circuits 201 and serve to interconnect of the memory space. 

a standard tape drive control 10 via data channel 20 to When a virtual track is modified by the data proces- 
data storage subsystem 100 to exchange data and con- sor 2, the updated instance of the virtual track is not 
trol information therebetween. rewritten in data storage subsystem 100 at its original 

Dynamic Virtual Device to Logical Device Mapping 35 location but is instead written to a new logical cylinder 
With respect to data transfer operations, all data and the previous instance of the virtual track is marked 
transfers go through cache memory 113. Therefore, obsolete. Therefore, over time a logical cylinder be- 
front end or channel transfer operations are completely comes riddled with "holes" of obsolete data known as 
independent of backend or device transfer operations. free space. In order to create whole free logical cylin- 
In this system, staging operations are similar to staging 40 ders, virtual track instances that are still valid and lo- 
in other cached disk subsystems but destaging transfers cated among fragmented free space within a logical 
are collected into groups for bulk transfers. In addition, cylinder are relocated within the disk drive array data 
this data storage subsystem 100 simultaneously per- storage subsystem 100 in order to create entirely free 
forms free space collection, mapping table backup, and logical cylinders. In order to evenly distribute data 
error recovery as background processes. Because of the 45 transfer activity, the tracks of each virtual device are 
complete front end/backend separation, the data stor- scattered as uniformly as possible among the logical 
age subsystem 100 is liberated from the exacting proces- devices in the disk drive array data storage subsystem 
sor timing dependencies of previous Count Key Data 100. In addition, virtual track instances are padded out 
disk subsystems. The subsystem is free to dedicate its if necessary to fit into an integral number of physical 
processing resources to increasing performance 50 device sectors. This is to insure that each virtual track 
through more intelligent scheduling and data transfer instance starts on a sector boundary of the physical 
control device. 

The disk drive array data storage subsystem 100 con- Virtual Track Directory 
sists of three abstract layers: virtual, logical and physi- FIG. 9 illustrates the format of the virtual track direc- 
cal. The virtual layer functions as a conventional large 55 tory 900 that is contained within cache memory 113. 
form factor disk drive memory. The logical layer func- The virtual track directory 900 consists of the tables 
tions as an array of storage units that are grouped into a that map the virtual addresses as presented by data 
plurality of redundancy groups (ex 122-1 to 122-n+m), processor 2 to the logical drive addresses that is used by 
each containing N+M disk drives to store N physical control unit 101. There is another mapping that takes 
tracks of data and M physical tracks of redundancy 60 place within control unit 101 and this is the logical to 
information for each logical track. The physical layer physical mapping to translate the logical address de- 
functions as a plurality of individual small form factor fined by the virtual track directory 900 into the exact 
disk drives, or other media. The data storage manage- physical location of the particular disk drive or second- 
ment system operates to effectuate the mapping of data ary media that contains data identified by the data pro- 
among these abstract layers and to control the alloca- 65 cessor 2. The virtual track directory 900 is made up of 
tion and management of the actual space on the physical two parts: the virtual track directory pointers 901 in the 
devices. These data storage management functions are virtual device table 902 and the virtual track directory 
performed in a manner that renders the operation of the 903 itself. The virtual track directory 903 is not contigu- 
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ous in cache memory 113 but is scattered about the 
physical extent of cache memory 113 in predefined 
segments (ex 903-1). Each segment 903-1 has a virtual to 
logical mapping for a predetermined number of cylin- 
ders, for example 64 cylinders worth of IBM 3380 type 5 
DASD tracks. In the virtual device table 902, there are 
pointers to as many of these segments 903 as needed to 
emulate the number of cylinders configured for each of 
the virtual devices defined by data processor 2. The 
virtual track directory 900 is created by control unit 101 10 
at the virtual device configuration time. When a virtual 
volume is configured, the number of cylinders in that 
volume is defined by the data processor 2. A segment 
903-1 or a plurality of segments of volatile cache mem- 
ory 113 are allocated to this virtual volume defined by 15 
data processor 2 and the virtual device table 902 is 
updated with the pointers to identify these segments 903 
contained within cache memory 113. Each segment 903 
is initialized with no pointers to indicate that the virtual 
tracks contained on this virtual volume have not yet 20 
been written. Each entry 905 in the virtual device table 
is for a single virtual track and is addressed by the vir- 
tual track address. As shown in FIG. 9, each entry 905 
is 64 bits long. The entry 905 contents are as follows 
starting with the high order bits: 25 

Bits 63: Migrated to Secondary Media Flag. 

Bit 62: Source Flag. 

Bit 61: Target Flag. 

Bits 60-57: Logical volume number. This entry corre- 
sponds to the logical volume table described above. 30 

Bits 56-46: Logical cylinder address. This data entry 
is identical to the physical cylinder number. 

Bits 45-3 1 : Sector offset. This entry is the offset to the 
start of the virtual track instance in the logical cylinder, 
not including the parity track sectors. These sectors are 35 
typically contained 512 bytes. 

Bits 30-24: Virtual track instance size. This entry 
notes the number of sectors that are required to store 
this virtual track instance. 

Bits 23-0: Virtual Track Access Counter. This entry 40 
contains a running count of the number of times the 
Virtual Track has been staged. 
If the Migrated to Secondary Media Flag is clear, the 
rest of the entry contains the fields described above. If 
the Migrated to Secondary Media Flag is set, the Logi- 45 
cal Cylinder containing the Virtual Track has been 
migrated to the Secondary Media and the rest of the 
entry contains a pointer to a Secondary Media Direc- 
tory. 

Secondary Media Directory 50 
The Secondary Media Directory contains pointers to 
all the data that has been migrated and is no longer 
resident on the DASD contained in the subsystem, The 
Secondary Media Directory also contains a Retrieving 
flag for each Logical Cylinder indicating that the data is 55 
in the process of being retrieved, The Secondary Media 
Directory is kept in cache and is backed up along with 
the Virtual Track Directory to allow recovery in the 
event of a cache failure. 
Logical Cylinder Directory 60 
FIG. 12 illustrates the format of the Logical Cylinder 
Directory, Each Logical Cylinder that is written con- 
tains in its last few sectors a Logical Cylinder Directory 
(LCD). The LCD is an index to the data in the Logical 
Cylinder and is used primarily by Free Space Collection 65 
to determine which Virtual Tracks Instances in the 
Logical Cylinder are valid and need to be collected. 
FIG. 12 shows the LCD in graphic form, The Logical 



Cylinder Sequence Number uniquely identifies the Log- 
ical Cylinder and the sequence in which the Logical 
Cylinders were created. It is used primarily during 
Mapping Table Recovery operations. The Logical Ad- 
dress is used as a confirmation of the Cylinders location 
for data integrity considerations. The LCD Entry count 
is the number of Virtual Track Instances contained in 
the Logical Cylinder and is used when scanning the 
LCD Entries, The Logical Cylinder Collection History 
contains when the cylinder was created, whether it was 
created from Updated Virtual Track Instances or was 
created from data collected from another cylinder, and 
if it was created from collected data, what was the 
nature of the collected data. The LCD Entry itself 
contains the identifier of the virtual track and the identi- 
fier of the relative sector within the logical cylinder in 
which the virtual track instance begins. 
Free Space Directory 

The storage control also includes a free space direc- 
tory (FIG. 8) which is a list of all of the logical cylinders 
in the disk drive array data storage subsystem 100 or- 
dered by logical device. Each logical device is cata- 
loged in two lists called the free space list and the free 
cylinder list for the logical device; each list entry repre- 
sents a logical cylinder and indicates the amount of free 
space that this logical cylinder presently contains. This 
free space directory contains a positional entry for each 
logical cylinder; each entry includes both forward and 
backward pointers for the doubly linked free space list 
for its logical device and the number of free sectors 
contained in the logical cylinder. Each of these pointers 
points either to another entry in the free space list for its 
logical device or is null. In addition to the pointers and 
free sector count, the Free Space Directory also con- 
tains entries that do not relate to Free Space, but relate 
to the Logical Cylinder. There is a flag byte known as 
the Logical Cylinder Table (LCI) which contains, 
among other flags, a Collected Flag and an Archive 
Flag. The Collected Flag is set when the logical cylin- 
der contains data that was collected from another cylin- 
der. The Archive Flag is set when the logical cylinder 
contains data that was collected from a logical cylinder 
which had its Collected Flag set. If either one of these 
flags is set, the Access Counter and the Last Access 
Date/Time is valid. The Creation Time/Date is valid 
for any cylinder that is not free. 

The collection of free space is a background process 
that is implemented in the disk drive array data storage 
subsystem 100. The free space collection process makes 
use of the logical cylinder directory, which is a list 
contained in the last few sectors of each logical cylin- 
der, indicative of the contents of that logical cylinder. 
The logical cylinder directory contains an entry for 
each virtual track instance contained within the logical 
cylinder. The entry for each virtual track instance con- 
tains the identifier of the virtual track instance and the 
identifier of the relative sector within the logical cylin- 
der in which the virtual track instance begins. From this 
directory and the virtual track directory, the free space 
collection process can determine which virtual track 
instances are still current in this logical cylinder and 
therefore need to be moved to another location to make 
the logical cylinder available for writing new data. 

Mapping Tables 

It is necessary to accurately record the location of all 
data within the disk drive array data storage subsystem 
100 since the data received from the data processors 2 - 
2' is mapped from its address in the virtual space to a 
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physical location in the subsystem in a dynamic fashion. 
A virtual track directory is maintained to recall the 
location of the present instance of each virtual track in 
disk drive array data storage subsystem 100. Changes to 
the virtual track directory are journaled to a non- 5 
volatile store and are backed up with fuzzy image cop- 
ies to safeguard the mapping data. The virtual track 
directory 3 consists of an entry 300 (FIG. 9) for each 
virtual track which the associated data processor 2 can 
address. The virtual track directory entry 300 also con- 10 
tains data 307 indicative of the length of the virtual 
track instance in sectors. The virtual track directory 3 is 
stored in noncontiguous pieces of the cache memory 
113 and is addressed indirectly through pointers in a 
virtual device table. The virtual track directory 3 is 15 
updated whenever a new virtual track instance is writ- 
ten to the disk drives. 

The storage control also includes a free space direc- 
tory 800 (FIG. 8) which is a list of all of the logical 
cylinders in the disk drive array data storage subsystem 20 
100 ordered by logical device. Each logical device is 
cataloged in a list called a free space list 801 for the 
logical device; each list entry represents a logical cylin- 
der and indicates the amount of free space that this 
logical cylinder presently contains. This free space di- 25 
rectory contains a positional entry for each logical cyl- 
inder; each entry includes both forward 802 and back- 
ward 803 pointers for the doubly linked free space list 
801 for its logical device and the number of free sectors 
contained in the logical cylinder. Each of these pointers 30 
802, 803 points either to another entry in the free space 
list 801 for its logical device or is null. The collection of 
free space is a background process that is implemented 
in the disk drive array data storage subsystem 100. The 
free space collection process makes use of the logical 35 
cylinder directory, which is a list contained in the last 
few sectors of each logical cylinder indicative of the 
contents of that logical cylinder. The logical cylinder 
directory contains an entry for each virtual track in- 
stance contained within the logical cylinder. The entry 40 
for each virtual track instance contains the identifier of 
the virtual track instance and the identifier of the rela- 
tive sector within the logical cylinder in which the 
virtual track instance begins. From this directory and 
the virtual track directory, the free space collection 45 
process can determine which virtual track instances are 
still current in this logical cylinder and therefore need 
to be moved to another location to make the logical 
cylinder available for writing new data. 

Data Move/Copy Operation 50 

The data record move/copy operation instanta- 
neously relocates or creates a second instance of a se- 
lected data record by merely generating a new pointer 
to reference the same physical memory location as the 
original reference pointer in the virtual track directory. 55 
In this fashion, by simply generating a new pointer 
referencing the same physical memory space, the data 
record can be moved/copied. 

This apparatus instantaneously moves the original 
data record without the time penalty of having to 60 
download the data record to the cache memory 113 and 
write the data record to a new physical memory loca- 
tion. For the purpose of enabling a program to simply 
access the data record at a different virtual address, the 
use of this mechanism provides a significant time advan- 65 
tage. A physical copy of the original data record can 
later be written as a background process to a second 
memory location, if so desired. Alternatively, when one 



of the programs that can access the data record writes 
data to or modifies the data record in any way, the 
modified copy of a portion of the original data record is 
written to a new physical memory location and the 
corresponding address pointers are changed to reflect 
the new location of this rewritten portion of the data 
record. 

In this fashion, a data record can be instantaneously 
moved/copied by simply creating a new memory 
pointer and the actual physical copying of the data 
record can take place either as a background process or 
incrementally as necessary when each virtual track of 
the data record is modified by one of the programs that 
accesses the data record. This data record copy opera- 
tion can be implemented in a number of different ways. 
A first method of manipulating memory pointers is to 
use a lookaside copy table which functions as a map to 
be used by the data storage subsystem 100 to list all the 
data records that are accessible by more than one vir- 
tual address. A second method of manipulating data 
record pointers is to provide additional data in the vir- 
tual track directory 3 in order to record the copy status 
of each data record therein. These two methods each 
have advantages and disadvantages in the implementa- 
tion of the data record pointer management function 
and are disclosed herein as implementations illustrative 
of the concept of this invention. 

Copy Table Implementation 

Each entry 300 in the Virtual Track Directory 
(V I D) 3 contains two flags associated with the Copy/- 
Move function. The "Source" flag 306 is set whenever 
a Virtual Track Instance at this Virtual Track Address 
has been the origin of a copy or move. The Virtual 
Track Instance pointed to by this entry 300 is not neces- 
sarily the Source, but the Virtual Track Instance con- 
tains this Virtual Address. If the Source flag 306 is set, 
there is at least one entry in the Copy Table 400 (FIG. 
4) for this Virtual Address. The "Target** flag 303 is set 
whenever a Virtual Track Instance contains data that 
has been the destination of a copy or move. If the Tar- 
get flag 303 is set, the Virtual Address in the Virtual 
Track Instance that is pointed to is not that of the Vir- 
tual Track Directory Entry 300. 

The format of the Copy Table 400 is illustrated 
graphically in FIG. 4. The preferred implementation is 
to have a separate lookaside Copy Table 400 for each 
Logical Device so that there is a Copy Table head 401 
and tail 402 pointer associated with each Logical De- 
vice; however, the copy table 400 could just as easily be 
implemented as a single table for the entire data storage 
subsystem 100. In either case, the changes to the copy 
table 400 are journaled as noted above for the virtual 
track directory. The copy table is ordered such that the 
sources 4*0 are in ascending Logical Address order. 
The copy table 400 is a singly linked list of Sources 4*0 
where each Source (such as 410) is the head of a linked 
list of Targets 411,412. The Source Entry 410 contains 
the following data: 



Logical Address (VTD Entry Copy) 
Virtual Address 

Next Source Pointer (NULL if last Source in 
list) 

Target Pointer 

The Target Entry 411 contains the following data: 
Virtual Address 

Next Target Pointer (NULL if last Target in 
list) 
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Snapshot Copy Operation Using Copy Table 
FIG. 5 illustrates in flow diagram form the opera- 
tional steps taken by data storage subsystem 100 to 
produce a copy of a virtual track instance (also referred 
to as data record) using the copy table implementation 
of the snapshot copy operation* When data processor 2 
transmits a data copy request to data storage subsystem 
100 over data link 21 at step 501, the control software in 
processor 204-0 for example translates the received data 
copy request into an identification of a particular virtual 
track directory entry 300 stored in cache memory 113 at 15 
step 502. Processor 204-0 in data storage subsystem 100 
verifies at step 503 that the extents are defined, the same 
length and do not overlap. The cache management 
software ensures at step 504 that all the tracks in this 
target extent are cleared and available for the copy 
operation. Processor 204-0 reads the virtual track direc- 
tory entry 300 and creates at step 505 a copy of this 
entry to be used as the virtual track directory entry for 
the target virtual track. At step 506 processor 204-0 sets 
the source 306 and target 303 flags respectively in the 25 
original and copied virtual track directory entries. Pro- 
cessor 204-0 then writes at step 507 the updated virtual 
track directory entry for the source virtual track back 
into the virtual track directory 3 as well as the new 
virtual track directory entry for the target virtual track 
into the virtual track directory 3. At step 508, a determi- 
nation is made whether the source virtual track is al- 
ready listed in copy table 400. If the source virtual track 
is not already a source or a target virtual track in copy 
table 400, then both the source entry and a target entry 
are created by processor 204-0 and written at step 509 
into copy table 400 in the form noted above with re- 
spect to FIG. 4. If the source data record was already 
marked as a source or a target data record in copy table 
400, then copy table 400 is scanned at step 510 in order 
to locate this entry and the target entry is added to this 
linked list to create a new target for this source data 
record. A more specific recitation of this process is 
illustrated in the following pseudo code: 45 



Read VTD Entry for Source 
Set Source Flag in VTD Entry 
Write updated VTD Entry for Source back to VTD 
Set Target Flag in a copy of the VTD Entry 
Read VTD entry for Target 
Increase Free Space for Cylinder pointed to by 
old VTD entry 

Reorder Free List, if necessary 
Write updated VTD Entry to Target location in 
VTD 

Create Target Entry for the Copy Tabic 
Move the Update Count Fields Flag from the 

command to the Target Entry 
If Source is NOT already a Source or a Target 
Create Source Entry for Copy Table 
Link Source into proper location in Copy 
Table 

Link Target Entry to Source Entry in Copy 
Table 

Elseif (New Source was already marked as 
Source) 

Scan Source List to find Source in Copy 
Table 

If find Source 

Link Target to Last Target in this 
Source's Target List 
Else (scanned to end of Source List) 



Create Source Entry for Copy Table 
Link Source into proper location in 
source list 

Link Target Entry to Source Entry in 
Copy Table 

Endif 

Else (New Source was already in Copy Table as 
Target) 

Scan Source List to find Logical Address of 
Target 

Link Target Entry to Last Target in this 
Source's Target List 

Endif 

Journal the changes to the VTD and to the Copy 
Table 



Moving a data record without a copy operation is 
functionally similar to the snapshot copy operation 
described above. A significant difference is that the 
virtual track directory entry 300 contains a NULL 
pointer in the virtual track address 320 to indicate that 
this virtual address does not contain any data and the 
source flag bit 306 is set to indicate that this virtual 
address is still a source. The following pseudo code 
listing indicates an instant move operation for a target 
data record, to highlight the difference between this 
operation and the above noted data record copy opera- 
tion: 
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Read VTD Entry for New Target 

Increase Free Space for Cylinder pointed to by 

old VTD Entry 
Reorder Free List if necessary 
Read VTD Entry for the Source 
Set Source Flag in VTD Entry 
Write a NULL pointer into the Logical Address 

Pointer of the VTD Entry 
Write Updated VTD for Source back to VTD 
Write an unmodified copy of the old VTD Entry 

to Target location in VTD 
(This entry already has the Target Flag set) 
Scan Source List for the Logical Address in the 

VTD Entry for the Target 
Scan Target List to find this Target in Target 

List 

Update the Target Entry to the address the data 
was moved to 

Move the Update Count Fields Flag from the 
command to the Target Entry 
Journal the changes to the VTD and to the Copy 
Table 



Virtual Track Directory Copy Implementation 
This second method of managing the data pointers 
makes use of an expanded virtual track directory 3 
which increases each entry 300 to allow room for a 
virtual track address 320 that consists of copy virtual 
device number 308, copy virtual cylinder number 309 
and copy virtual head number 310 elements which act 
as a pointer to another virtual track that was copied 
from the first virtual track. The virtual track directory 
entry for the track pointed to from the first virtual track 
directory entry contains the same logical address as the 
first and contains the virtual track address of the next 
virtual track directory entry in the chain of target data 
records. Thus, multiple tracks copied from a single 
source track are identified by a singly linked list that 
loops back to itself at the source track to form a syn- 
onym ring of pointers. Thus, the virtual track directory 
itself contains an embedded copy table instead of using 
the lookaside copy table described above. Theoreti- 
cally, any number of copies of a single track can be 
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made using this method since the virtual track directory 
entries are simply linked together in ring form. As a 
management construct, the number of copies can be 
limited to a predetermined number and, if a user re- 
quests further copies to be made, a second set of copies 
can be created by staging the data record from the 
backend data storage devices to make a second physical 
copy in cache memory 113 which can be used as the 
basis of a second ring in order to enable the length of 
each ring to be maintained at a reasonable manageable 
number. 

The operation of the virtual track directory imple- 
mentation is illustrated in flow diagram form in FIG. 17. 
At step 1001, the data storage subsystem receives a copy 
request from data processor 2 over data channel 21. 
Processor 204-0 in data storage subsystem 100 verifies at 
step 1002 that the extents are defined, the same length 
and do not overlap. The cache management software 
ensures at step 1003 that all the tracks in this target 20 
extent are cleared and available for the copy operation. 
This is explained in further detail in the following 
pseudo code: 



10 
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For each track in source extent, search cache 
IF track is found 
Mark track as 'Copy Loop Track* 
IF track is modified 
CALL Copy Modified Track service routine 
PASS Source Virtual Track Address 
PASS Target Virtual Track Address 
Function forms Copy Loop in VTD and 
marks target as 'No Backend Address' 
IF Copy Loop is below Max Size 

RETURN (SUCCESS) ;• No Action 
Necessary 
ELSE (Loop too big • Need to Break 
Mark Target as Pseudosource in VTD 
Entry 

RETURN (Cache Copy to Target 
Address and Destage Target) 
ENDIF 

RECEIVE Status 
Cache must do the following: 
IF status is Cache Copy to Target and 
Destage Target 
Do Not search for track - target 
can't be in cache 

Do a Cache to Cache Copy of the 
source 

Load the copy with the target 
address 

Schedule the Destage of the track 
ENDIF 
ENDIF 
ENDIF 
ENDFOR 
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IF the loop is bigger than limit 
Set 'Hold OfT VCKD Response' flag 
Increment Copy Notify Count in Copy Command in 
Virtual Device Table 

IF any target is marked as Modified or as a 
pseudosource 
Mark Pseudosource as 'Notify when Destaged' 
Destage Task will tell Copy Task when the 
destage is complete and the Loop Size is 
reduced 

CALL Destage Track Cache function 
PASS Virtual Track Address 
PASS No Response Indicator . 
ELSE (No tracks arc modified) 
Mark Target as Pseudosource in VTD Entry 
(Set Source and Target) 
Mark Target (Pseudosource) as 'Notify when 
Destaged* 

CALL Stage and Destage Track cache service 
routine 

PASS Target Address 

Cache SW must hash to the passed address. 

IF the track is in cache 
Schedule the Destage of the track 

ELSE (Track is not in cache) 
Schedule the Stage of the track 
Once track is in cache, immediately 
schedule the Destage of the track 

ENDIF 

When track is destaged, Destage Task 
breaks Copy Loop 

into two Copy Loops with pseudosource as 
new source, 
and returns response to Copy Task. 
ENDIF 
ENDIF 
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Once this operation is completed, the source and target 35 
virtual track directory entries are updated at step 1004 
to indicate their status as source and target, respectively 
and the virtual track address information contained 
therein is modified at step 1005 to indicate that both of 
these virtual track directory entries are part of a copy gQ 
loop. 

In order to limit the length of the singly linked list of 
source and target tracks in the copy operation, the 
length of the copy list is checked at step 1006 and if less 
than a predetermined limit, the task is completed. If the 65 
copy list exceeds this predetermined limit, then at step 
1007 a second copy loop is created as described in the 
following copy count management code: 



Staging and Destaging of Copy Loop Tracks 
When a track is to be updated in cache memory 113, 
it must be determined whether this track is part of a 
copy loop. It is important to do this to ensure that the 
integrity of the multiple copies of this track are main- 
tained and that only the appropriate copies of this track 
are modified according to the following procedure: 



IF the track is not a Copy Loop 
RETURN (SUCCESS) 
No action required by cache SW 
ELSEIF (the track is a Target) 
IF the track marked as a Pseudosource in VTD 
RETURN (Do Not Update - Track Being 
Scheduled for Destage) 
ELSE 

Mark track as 'Modified In Cache' in VTD 
Entry 

RETURN (SUCCESS) 
No action required by cache SW 
ENDIF 
ELSE (the track is a Source) 

Scan Copy Loop to find an unmodified target 
IF unmodified target found 
Could be marked *No Backend Address' 
Mark Target as Pseudosource in VTD Entry 
(Set Source and Target) 
RETURN (Cache Copy to Returned Address and 
Destage Returned Track) 
Cache SW must hash to the returned 
address. 

IF the track is in cache 
Schedule the Destage of the track 

ELSE (Track is not in cache) 
DO a Cache to Cache Copy of the source 
Load the copy with the returned 
address (Pseudosource address) 
Schedule the Destage of the track 

ENDIF 

Go ahead with modifications to the source 
ELSE (unmodified target not found) 
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RETURN (Target List with an indicator to 
Dcstagc Targets) 

Cache SW must hash to the passed 

addresses and - 5 

schedule the Destage of all those tracks. 

Dcstagc will not allow the source track 

to be 

destaged until all the targets are 
destaged first 

The Cache SW can go ahead with 10 
modifications to the source 
END IF 
ENDEF 



As can be seen from this pseudo code, a cache to 15 
cache copy of the track must be made if this track is a 
source in a copy loop in order to ensure that the noted 
copies of this track are maintained at their present status 
and not corrupted by the modification to the original 
track performed by cache memory 113. Similarly, the 20 
destaging of copy loop tracks are performed in a man- 
ner to maintain the integrity of the copy loop and ensure 
that the proper vintage of data is written to the appro- 
priate physical location in the backend data storage for 
the designated virtual address. 25 

FIG. 18 illustrates in flow diagram form the opera- 
tional steps taken by processor 204-0 when at step 1101 
it schedules the writing of a virtual track that is a target 
to the backend storage 103. At step 1102, a check is 
made of the copy list to determine whether the target is 30 
the only target for the associated source. If not, at step 
1103, the target is destaged from cache memory 113 to 
backend storage 103 and the copy virtual track address 
320 of the previous track in the copy list is updated to 
reflect the deletion of this target from the copy list. The 35 
target flag for the written target is reset at step 1107 to 
reflect the deletion of this target from the copy list. If 
this is the last target, at step 1104 it is destaged from 
cache memory 113 to backend storage 103. At step 
1106, the source copy virtual track address 320 is de- 40 
leted and the source and target flags are reset at step 
1108 in the corresponding virtual track directory 
entries. The destaging algorithm for the copy list is 
described using the following pseudo code: 
45 

IF track is a source 

There must be at least one target for track 
to be a source 
IF all targets marked as *No Backend Address' 
Write track to DASD 

Put Logical Address in Source and Targets in 
VTD 

Remove *No Backend Address' indication 
ELSEIF (any Targets marked as 'Modified in 
Cache' 

AND Not marked "Scheduled for Destage') 

Create TCB containing: 
Destage Failure Indicator 
Targets Not Destaged First Indicator 
Addresses of targets modified in cache and 
not scheduled for destage 

CALL Cleanup Track cache service routine 

PASS TCB Pointer 

The Targets may in fact have been 

scheduled for destage 

following the scheduling of the source. 

Cache SW must do following: 

FOR (All target addresses returned to 

cache) 

IF track not scheduled for destage 
CALL Destage Track Request 
PASS Returned Target Address 
END IF 



ENDFOR 

CALL Destage Track Request 
PASS Source Address 
ELSE (any Targets marked as 'Modified in Cache* 

AND marked 'Scheduled for Destage') 
Put Request on Destage Blocked Queue marked 
as 

'Do Not Destage until Modified Target 
Destaged* 
END IF 

ELSEIF (track is a pseudosource) 
Write track to DASD 
Update VTD to mark track as source 
Unlink previous source from Copy Loop 
Update Target Physical Addresses to new source 
location 

EF track marked as 'Notify when Destage' 
CALL Notify Copy Task function 
PASS Target Virtual Address 
END IF 
ELSEIF (track is a target) 
Must be modified 

Update VTD to mark track as 'Scheduled for 
Destage' 
Write track to DASD 
IF track marked as 'Notify when Destage' 
CALL Notify Copy Task function 
PASS Target Virtual Address 
ENDIF 

IF (this is the last modified target in the 
Copy Loop 

AND source is in Destage Blocked Queue marked 

as 

*Do Not Destage until Modified Target 
Destaged') 

Move source request to Destage Request Queue 
Write source track to DASD 
ENDIF 
ENDIF 
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As can be seen from these routines, care must be 
taken to not intermingle various versions of the virtual 
track instances as the copy loop is created, expanded 
and contracted by the movement of data into and out of 
cache memory 113 and the appropriate backend stor- 
age. A corresponding destaging process is executed for 
a copy table implementation of the pointer manage- 
ment. 
Data Read Operation 

FIGS. 6 and 7 illustrate in flow diagram form the 
operational steps taken by processor 204 in control unit 
101 of the data storage subsystem 100 to read data from 
a data redundancy group 122-1 to 122-n+m in the disk 
drive subsets 103, The disk drive array data storage 
subsystem 100 supports reads of any size. However, the 
logical layer only supports reads of virtual track in- 
stances. In order to perform a read operation, the vir- 
tual track instance that contains the data to be read is 
staged from the logical layer into the cache memory 
113. The data record is then transferred from the cache 
memory 113 and any clean up is performed to complete 
the read operation. 

At step 601, the control unit 101 prepares to read a 
record from a virtual track. At step 602, the control unit 
101 branches to the cache directory search subroutine 
to assure that the virtual track is located in the cache 
memory 113 since the virtual track may already have 
been staged into the cache memory 113 and stored 
therein in addition to having a copy stored on the plu- 
rality of disk drives (122-1 to 122-n+m) that constitute 
the redundancy group in which the virtual track is 
stored. At step 603, the control unit 101 scans the hash 
table directory of the cache memory 113 to determine 
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whether the requested virtual track is located in the tected then at step 611 a determination is made whether 
cache memory 113. If it is, at step 604 control returns the errors can be fixed. One error correction method is 
back to the main read operation routine and the cache the use of a Reed-Solomon error detection/correction 
staging subroutine that constitutes steps 605-616 is ter- code to recreate the data that cannot be read directly. If 
minated. 5 the errors cannot be repaired then a flag is set to indi- 
Assume, for the purpose of this description, that the cate to the control unit 101 that the virtual track in- 
virtual track that has been requested is not located in the stance can not be read accurately. If the errors can be 
cache memory 113. Processing proceeds to step 605 fixed, then in step 6%2 the identified errors are corrected 
where the control unit 101 looks up the address of the and processing proceeds to step 630 where a test of the 
virtual track in the virtual to logical map table. At step 10 Collected Flag in the Logical Cylinder Table (LCT) is 
620, control unit 101 determines whether the requested made. If the Collected Flag -is clear, steps 631 and 632 
virtual track resides on secondary media by reviewing are skipped and processing proceeds to step 604. If the 
the contents of the virtual track directory as described Collected Flag is set, processing proceeds to step 631 
above. If the requested virtual track is not on secondary where the Logical Cylinder Access Counter is incre- 
media, processing advances to step 606 as described 15 merited and the Last Access Time/data is loaded with 
below. the current time and date. Processing then returns back 
Retrieve Logical Cylinder From Secondary Media to the main routine at step 604 where a successful read 
If the requested virtual track is archived on second- of the virtual track instance from the redundancy group 
ary media, control unit 101 branches to step 621 where to the cache memory 113 has been completed, 
it reads the secondary media directory, located in cache 20 At step 617, control unit 101 transfers the requested 
memory 113 to obtain the pointer indicative of the phys- data record from the staged virtual track instance in 
ical location of the requested virtual track, for example which it is presently stored. Once the records of interest 
on magnetic tape 10A. At step 622, control unit 101 from the staged virtual track have been transferred to 
obtains an unused logical cylinder in disk drive array the data processor 2 that requested this information, 

100 to store the logical cylinder containing the re- 25 then at step 618 the control unit 101 cleans up the read 
quested virtual track, that is to be retrieved from the operation by performing the administrative tasks neces- 
secondary media. At step 623, control unit 101 sets the sary to place all of the apparatus required to stage the 
Retrieving flag in the secondary media directory to virtual track instance from the redundancy group to the 
indicate that the logical cylinder is in the process of cache memory 113 into an idle state and control returns 
being transferred from the secondary media. Control 30 at step 619 to service the next operation that is re- 
unit 101 reads the logical cylinder containing the re- quested. 

quested virtual track from its location in the secondary Data Write Operation 

media to the reserved logical cylinder. Once the re- FIG. 13 illustrates in flow diagram form the opera- 
quested logical track has been transferred to the re- tional steps taken by the disk drive array data storage 
served logical cylinder, at steps 624 and 625, control 35 subsystem 100 to perform a data write operation. The 
unit updates the status of this logical cylinder in the disk drive array data storage subsystem 100 supports 
secondary media directory and virtual track directory, writes of any size, but again, the logical layer only sup- 
respectively, ports writes of virtual track instances. Therefore in 
Logical Track Staging order to perform a write operation, the virtual track 
V\|y) The control unit 101 allocates space in cache memory 40 that contains the data record to be rewritten is staged 
^ 113 for the data and relocates the logical address to the from the logical layer into the cache memory 113. The 
cache directory. At step 606, the logical map location is modified data record is then transferred into the virtual 
used to map the logical device to one or more physical track modified and this updated virtual track instance is 
devices in the redundancy group. At step 607, the con- then scheduled to be written from the cache memory 
trol unit 101 schedules one or more physical read opera- 45 113 where the data record modification has taken place 
tions to retrieve the virtual track instance from appro- into the logical layer. Once the backend write operation 
priate ones of identified physical devices 122-1 to 122- is complete, the location of the obsolete instance of the 
n+m. At step 608, the control unit 101 clears errors for virtual track is marked as free space. Any clean up of 
these operations. At step 609, a determination is made the write operation is then performed once this transfer 
whether all the reads have been completed, since the.' 50 and write is completed. 

requested virtual track instance may be stored on more At step 701, the control unit 101 performs the set up 

than one of the N+M disk drives in a redundancy ■ for a write operation and at step 702, as with the read 

group. If all of the reads have not been completed, operation described above, the control unit 101 

processing proceeds to step 614 where the control unit branches to the cache directory search subroutine to 

101 waits for the next completion of a read operation by 55 assure that the virtual track into which the data is to be 
one of the N-f M disk drives in the redundancy group. transferred is located in the cache memory 113. Since 
At step 615 the next reading disk drive has completed its • all of the data updating is performed in the cache mem- 
operation and a determination is made whether there ory 113, the virtual track in which this data is to be 
are any errors in the read operation that has just been written must be transferred from the redundancy group 
completed. If there are errors, at step 616 the errors are 60 in which it is stored to the cache memory 113 if it is not 
marked and control proceeds back to the beginning of already resident in the cache memory 113. The transfer 
step 609 where a determination is made whether all the of the requested virtual track instance to the cache 
reads have been completed. If at this point all the reads memory 113 is performed for a write operation as it is 
have been completed and all portions of the virtual described above with respect to a data read operation 
track instance have been retrieved from the redundancy 65 and constitutes steps 603-616 illustrated in FIG. 6 
group, then processing proceeds to step 610 where a above. 

determination is made whether there are any errors in . t&) At step 703, the control unit 101 transfers the modi- 

the reads that have been completed. If errors are de- ^ fted record data received from host processor 11 into 



10/19/2002, EAST Version: 1.03.0007 



29 



5,403,639 



10 



the virtual track that has been retrieved from the redun- ~| 
dancy group into the cache memory 113 to thereby I 
merge this modified record data into the original virtual 1 
track instance that was retrieved from the redundancy / 
group. Once this merge has been completed and the 
virtual track now is updated with the modified record 
data received from host processor 11, the control unit 
101 must schedule this updated virtual track instance to 
be written onto a redundancy group somewhere in the 
disk drive array data storage subsystem 100. 

This scheduling is accomplished by the subroutine 
that consists of steps 705-710. At step 705, the control 
unit 101 determines whether the virtual track instance 
as updated fits into an available open logical cylinder. If 
it does not fit into an available open logical cylinder, 15 
then at step 706 this presently open logical cylinder 
must be closed out and written to the physical layer and 
another logical cylinder selected from the most free 
logical device or redundancy group in the disk drive 
array data storage subsystem 100. At step 707, the selec- 20 
tion of a free logical cylinder from the most free logical 
device takes place. This ensures that the data files re- 
ceived from data processor 2 are distributed across the 
plurality of redundancy groups in the disk drive array 
data storage subsystem 100 in an even manner to avoid 25 
overloading certain redundancy groups while under- 
loading other redundancy groups. Once a free logical 
cylinder is available, either being the presently open 
logical cylinder or a newly selected logical cylinder, 
then at step 708, the control unit 101 writes the updated 30 
virtual track instance into the logical cylinder and at 
step 709 the new location of the virtual track is placed 
in the virtual to logical map in order to render it avail- 
able to the data processors 2 - 2'. At step 710, the con- 
trol unit 101 marks the virtual track instance that is 35 
stored in the redundancy group as invalid in order to 
assure that the logical location at which this virtual 
track instance is stored is not accessed in response to 
another data processor 2' attempting to read or write 
the same virtual track. Since the modified record data is 40 
to be written into this virtual track in the cache memory 
113, the^fiop^uof the virtual track that resides in the 
redundancy group is now inaccurate and must be re- 
moved from access by the data processors 2 - 2'. At step 
711, control returns to the main routine, where at step AS 
712 the control unit 101 cleans up the remaining admin- 
istrative tasks to complete the write operation. At step 
713, the processor 204 updates the free space directory 
to reflect the additional free space in the logical cylin- 
der that contained the previous track instance and re- 50 
turn to an available state at 714 for further read or write 
operations from data processor 2. 
Free Space Collection 

When data in cache memory 113 is modified, it can- 
not be written back to its previous location on a disk 55 
drive in disk drive subsets 103 since that would invali- 
date the redundancy information on that logical track 
for the redundancy group. Therefore, once a virtual 
track has been updated, that track must be written to a 
new location in the data storage subsystem 100 and the 60 
data in the previous location must be marked as free 
space. Therefore, in each redundancy group, the logical 
cylinders become riddled with "holes'* of obsolete data 
in the form of virtual track instances that are marked as 
obsolete. In order to create completely empty logical 65 
cylinders for destaging, the valid data in partially valid 
cylinders must be read into cache memory 113 and 
rewritten into new previously emptied logical cylin- 
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ders. This process is called free space collection. The 
free space collection function is accomplished by con- 
trol unit 101. Control unit 101 selects a logical cylinder 
that needs to be collected as a function of how much 
free space it contains. The free space determination is 
based on the free space directory as illustrated in FIG. 
8, which indicates the availability of unused memory in 
data storage subsystem 100. The table illustrated in 
FIG. 8 is a listing of all of the logical devices contained 
in data storage subsystem 100 and the identification of 
each of the logical cylinders contained therein. The 
entries in this chart represent the number of free physi- 
cal sectors in this particular logical cylinder. A write 
cursor is maintained in memory and this write cursor 
indicates the available open logical cylinder that control 
unit 101 will write to when data is destaged from cache 
113 after modification by associated data processor 2 or 
as part of a free space collection process. In addition, a 
free space collection cursor is maintained which points 
to the present logical cylinder that is being cleared as 
part of a free space collection process. Therefore, con- 
trol unit 101 can review the free space directory illus- 
trated in FIG. 8 as a backend process to determine 
which logical cylinder on a logical device would most 
benefit from free space collection. Control unit 101 
activates the free space collection process by reading all 
of the valid data from the selected logical cylinder into 
cache memory 113. The logical cylinder is then listed as 
completely empty and linked into the Free Cylinder 
List since all of the virtual track instances therein are 
tagged as obsolete. Additional logical cylinders are 
collected for free space collection purposes or as data is 
received from an associated data processor 2 until a 
complete logical cylinder has been filled. Once a com- 
plete logical cylinder has been filled, a new previously 
emptied logical cylinder is chosen. 

FIG. 10 illustrates in flow diagram form the opera- 
tional steps taken by processor 204 to implement the 
free space collection process. When Free Space collec- 
tion has to be done, the best logical cylinder to collect 
is the one with the most sectors already free. This leads 
to the notion of a list of all of the logical cylinders in 
data storage subsystem 100 ordered by the amount of 
Free Space each contains. Actually, a list is maintained 
for each logical device, since it is desirable to balance 
free space across logical devices to spread virtual actua- 
tor contention as evenly as possible over the logical 
actuators. The collection of lists is called the Free Space 
Directory; the list for each logical device is called the 
Free Space List for the logical device. Each free space 
entry represents a logical cylinder. Each free space 
directory entry (FIG. 14) contains a forward and back- 
ward pointer to create a double linked list as well. Each 
logical device's Free Space Link List is terminated by 
head and a tail pointers. 

Each logical cylinder contains in its last few sectors a 
directory of its contents, called its Logical Cylinder 
Directory (LCD). This directory contains an entry for 
each virtual track instance contained within the logical 
cylinder.. The entry for a virtual track instance contains 
the identifier of the virtual track and the identifier of the 
relative sector within the logical cylinder in which the 
virtual track instance begins. From this directory, the 
serial number of the logical cylinder , instance, and the 
Virtual Track Directory, the Free Space Collection 
Process can determine which virtual track instances are 
still current in the logical cylinder and therefore need to 
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be moved to make the logical cylinder available for 
writing new data. 

The basic process is initiated at step 1000 when pro- 
cessor 204 opens a logical cylinder to receive data col- 
lected, then proceeds to step 1001 where processor 204 
selects a Logical Cylinder (TLC) for collection based on 
the number of free logical sectors as listed in the Free 
Space Directory table of FIG. 8. At step 1002, proces- 
sor 204 reads the logical cylinder directory for the logi- 
cal cylinder that was selected at step 1001. Processor 
204 then at step 1003 reads the logical address from the 
virtual track directory (VTD) entry for each virtual 
track address that is contained in the read logical cylin- 
der directory. At step 1005, processor 204 compares the 
logical address that was stored in the virtual track direc- 
tory entry with the logical address that was stored in 
the logical cylinder directory. If these two addresses do 
not match, that indicates the track instance is not valid 
for this virtual address and at step 1017 processor 204 
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from the cylinder being collected is Low Access. The 
track is low access if the Virtual Track Access Counter 
from the VTD divided by the age of the logical cylinder 
is below a low access threshold. The age of the Logical 
Cylinder is calculated by subtracting the Creation 
Data/Time (in the LCD) from the Current Data/Time. 
If the virtual track is low access, the data is written, at 
step 1108 to the low access logical cylinder. If the vir- 
tual track is not low access, the data is written, at step 
1109 to the regular access logical cylinder. 
Migrate Logical Cylinder - 

Data that is stored in Low Access Cylinders can be 
migrated to secondary media, such as magnetic tape 
10A or bulk disk storage, such as optical media. This is 
accomplished automatically and dynamically in disk 
drive array 100 by control unit 101. The data migration 
process illustrated in FIG. 15 is initiated at step 1501 
either periodically by control unit 101 to migrate data to 
secondary media on a regular basis or on a demand 



determines that this track should not be relocated and 20 driven basiSi such ^ when the number of available logi- 



execution exits. 

If, at step 1005, processor 204 determines that the 
virtual address stored in the virtual track descriptor 
matches the virtual address stored in the logical cylin- 
der directory, at step 1006 the virtual track instance is 25 
staged into predetermined location in cache memory 
113. Processor 204 destages the virtual track instance to 
the disk drive subset 103 that contains the logical cylin- 
der used by this free space collection process at step 
1008. At step 1011, processor 204 updates the virtual 30 
track directory entry and exits at step 1020. At step 
1020, processor 204 updates the free space directory to 
indicate that the collected cylinder is now a free cylin- 
der available for data storage purposes and the data 
previously contained therein has been collected to a 35 
designated logical cylinder and the appropriate map- 
ping table entries have been updated. 

Enhanced Free Space Collection 

Enhanced Free Space Collection occurs when a cyl- 
inder is collected that has already been collected before, 40 
as indicated by the Collected Flag in the Logical Cylin- 
der Table (LCT). When data is collected and written to 
a cylinder separate from the normal destaging cylinder, 
that data is Read-Only or Low Access relative to the 
rest of the data in the Logical Cylinder, since any data 45 
that is updated is written to new cylinders. Data that is 
collected a second time is Read-Only or Low Access 
relative to all the data in the subsystem so it is Archive 
data. When Free Space Collection collects a cylinder 
that has not been collected before, the basic Free Space 50 
Collection Algorithm, as described in the previous sec- 
tion, is used. When Free Space Collection collects a 
cylinder that has the Collected or the Archive Flag in 
the LCT set, the Enhanced Free Space Collection Al- 
gorithm is used. FIG. 11 illustrates in flow diagram 55 
form the operational steps taken by processor 204 in 
control unit 101 of the data storage subsystem 100 to 
perform Enhanced Free Space Collection, The differ- 
ences between Basic and Enhanced Free Space Collec- 
tion are minor, but they are important to the hierarchi- 60 
cal algorithm since they differentiate data into Low 
Access and Regular Access Logical Cylinders. In step 
1100, we allocate two logical cylinders to receive the 
data collected during free space collection. One cylin- 
der is used for Low Access Data and the other is used 65 
for Regular Access Data. Steps 1001 through 1006 are 
the same as the basic algorithm. At step 1107 there is a 
test to determine if the virtual track that has been read 



cal cylinders falls below a predetermined threshold or 
the number of relative versions of a generation data 
group exceeds a predetermined threshold. In either 
case, control unit initiates the migration process at step 
1501 and selects a logical cylinder at step 1502, identi- 
fied as a low access cylinder by calculating the access 
rate from the last three fields in the Free Space Direc- 
tory Entry as illustrated in FIG. 14. At step 1503, con- 
trol unit 101 writes the selected logical cylinder to sec- 
ondary media 10A. 

In operation, the selected logical cylinder is read 
from the redundancy group on which it is stored to 
cache memory 113 as described above. Once staged to 
cache memory 113, the selected logical cylinder is 
transferred to secondary media 10A via tape drive con- 
trol unit 10 and data channel 20 in well-known manner 
as described above. Once the data write process is com- 
pleted, control unit 101 at steps 1504, 1505 updates the 
status of the secondary media directory and virtual 
track directory, respectively to indicate the archived 
nature of the migrated logical cylinder. At step 1506, 
the logical cylinder in disk drive array 100 that stored 
the migrated logical cylinder is marked as free in the 
free space directory. The migration process concludes 
at step 1507 if no further logical cylinders are available 
for migration. Otherwise, the process of FIG. 15 is 
repeated. 
Database Example 

This example is to illustrate an application of the 
apparatus of the present invention. This example makes 
use of a database that is used for payroll purposes and 
contains the following files; 



Hie Name 



Contents 



PersonneLSalaries 

Accounting Jncom&Tax 

Payroll.Data 
Payroll. Data.Index 

Metadata 



Salary information from 

personnel system. 

IRS u formation regarding a 

specific payroll. 

Actual payroll data. 

Index to the payroll 

database. 

Data used by the host 
processor to access data 
sets on that volume. 



As illustrated in FIG. 1, the data sets are initially 
located on volumes 11-1, 11-2, 11-3 in a distributed 
manner in the functional volumes that are directly ad- 
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dressable by the data processor 2, All of these data sets notes the creation of a snapshot copy of a particular 
are related in that they are necessary elements of the snapshot application data group as illustrated in flow 
payroll function. When the user creates the monthly diagram form in FIG. 20. At step 2001, the data proces- 
payroll, the user application program 3 makes use of sor 2 terminates all activity on the data sets that are 
this set of data sets and therefore, all of these data sets 5 contained within the selected snapshot application data . 
must be temporally synchronized. In addition, if a user group that is to be copied. The termination of all activ- 
needed to recreate an old payroll run, due to an error ity is necessary because a plurality of user programs 3 
that was found in that payroll run, all of these data sets can concurrently access the data sets contained within 
are again required and these data sets must be tempo- this set of data sets. Therefore, to ensure that a single 
rally coordinated as of the time the user application 10 temporally coordinated copy of the set of data sets is 
program 3 ran the payroll which contained the error. made, the data processor 2 terminates all access to the 
The set of data sets must be temporally coordinated for source data sets within this snapshot application data 
each payroll iteration and therefore archive copies of group as defined in the definitional information noted in 
this set of data sets must be time coordinated as well as Table 1. At step 2002, if any of these data sets in this 
the present version of this set of data sets. 15 snapshot application data group are under the control of 
To accomplish this, the user defines a snapshot appli- a database management system, the user program 3 in 
cation data group (SADG) that is associated with the conjunction with data processor 2 uses the database 
above set of data sets. Assume for the purpose of this management system utilities to check point these data 
example, that the user defined this snapshot application sets to the database management system log Mies to 
data group using the name "PAYROLL.SADG". The 20 ensure consistency of the snapshot application data 
records defining the PAYROLL.SADG would be con- group with respect to the database management system 
tained in a SADG database on host processor 2 in the that operates on the selected data sets within this snap- 
file server system catalog Sb. The following tables illus- shot application data group. Processing advances to 
trate typical contents of such a snapshot application step 2003 where the user program 3 issues a snapshot 
data group definition. 25 copy command which is transmitted to the file server 
TABLE 1 system utility 6. At step 2004, the file server system 
. . utility 6 accesses the snapshot application data group 
^yrou.s^Definmon definition records that are stored in catalog 5b and as 
Consists of Jgj^^jafa illustrated in Table 1 above to determine the processing 
PcrsonneLSaiarics 30 options and the administrative handling required for 
Accounting.lncome.Tax this snapshot application data group. The file server 
Keep 12 generations system utility 6 also determines whether the user pro- 
^ocxpiration gram 3 has access authorization to this snapshot applica- 

' tion data group. At step 2005, the file server system 

35 utility 6 determines the identity of the functional vol* 

TABLE 2 umes that contain the source data sets which, in this 

PavroILSAE>G .G0023VOl Definition" example are functional volumes 11-1, 11-2, 11-3 which 

As of 673/92 at 09:30t45:53 volumes are identified by the source volume indicia 

Source Volume* = Snapshot Volumes V0001, V0002, V0003, respectively. At step 2006, the 

voooi = SV9356, 6 Segments 40 file server utility 6 issues a command to file server sys- 

Z™1 = S2« " Se « meats tern 1 to create a snapshot volume for the identified 

VOW3 - SV67 * 3 ' 4 Segments source volume. This is accomplished by transmitting 

the command through data channel interface 7 over 

As can be seen from the example of Table 1, the file data channel 8 to file server system 1. At step 2007, file 

server system catalog Sb contains snapshot application 45 server system 1 receives the issued command and ac- 

data group definition and administration information. In cesses the mapping table stored therein to determine the 

particular, the definition information includes a list of availability of snapshot volume resources to execute the 

all of the data sets in this set of data group that comprise received command. If insufficient resources are avail- 

the PAYROLL.SADG application data group. In addi- able, the file server system 1 returns an error code at 

tion, administrative information such as the number of 50 step 2013 to the file server system utility 6 which then 

versions or instances of the snapshot application data aborts the snapshot copy operation. If there are suffi- 

group that should be maintained in file server system 1 cient resources available to make a snapshot volume 

is also defined. Further information of an administrative copy of the identified source volume, at step 2008, file 

nature can also be stored in this definition file, such as server system 1 makes a copy of the pointers associated 

the expiration date beyond which copies of data sets 55 with the identified source volume as described above* 

should not be stored, identification of preferred media The file server system 1 also returns a snapshot volume^ 

to store certain instances of a generation data group, identification (SVID) and a number of segments of the 

etc. The information contained in Table 2 notes the snapshot volume that was consumed by the required 

specific mapping of each instance of a snapshot applica- copy as a measure of resource consumption to the file 

tion data group to the snapshot volume and while other 60 server system utility 6. At step 2009, the file server 

tables note the particular physical storage location in system utility 6 updates the catalog information as illus- " 

file server system 1 wherein the set of data sets is stored. trated in Table 2 to note the correspondence between 

The mapping includes a definition of the snapshot vol- the source volumes and the snapshot volumes for this 

umes which are outside the functional address space of particular instance of the snapshot application data " 

data processor 2 as well as the formula required to map 65 group and also notes the number of segments consumed 

each of these volumes. for each of the snapshot volumes. At step 2010, file 

? To better understand the use of these tables and the server system utility 6 determines whether additional 

apparatus described above, the following description source volumes need to be copied. If additional source 
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volumes need to be copies, processing returns to step 
2006 and steps 2006-2009 are repeated until file server 
system utility 6 determines that no further source vol- 
umes remain to be copied. At this point, processing 
advances to step 2011 wherein the file server system 5 
utility 6 posts all records and notifies the user applica- 
tion program 3 that a new instance of this snapshot 
application data group has been created. At step 2012, 
user application program 3 resumes normal processing 
of the source data sets. In this manner, the file server 10 
system 1 creates copies of the source volumes indepen- 
dent of the data processor 2 and these copies can be 
maintained on whatever media is designated by file 
server system 1 as described above. Therefore, the cre- 
ation of instances of a snapshot application data group IS 
take place independent of the data processor 2 thereby 
reducing the load on the data processor 2 and enabling 
the file server system 1 to maintain the various instances 
of each snapshot application data group on various 
types of media without requiring changes to data pro- 20 
cessor 2 or the operating system 4 resident thereon. This 
provides the user with the capability of adding new 
media to file server system 1 or modifying the operation 
of file server system 1 without impacting the data pro- 
cessor 2 since the operation of file server system 1 is 25 
completely transparent to data processor 2. 

User Access to Snapshot A pplication Data Grou p 
Once various instances of snapshot application data 
groups have been created by file server system 1, the 
user can access these instances for a number of reasons 30 
in a very controlled manner. The first example is the 
automatic creation of tape backups of source data sets. 
This can be done as described above with regard to the 
archive memory 10 or, alternatively, can be done via 
data processor 2 in the traditional manner wherein the 35 
data processor 2 retrieves the data sets to be archived 
and transmits them via a data cha nnel to an archiv e 
me mory which~is~~diTec"tly jc^nnectcd to data processor 
2rThis would be accomplished^Bymounting the desig- 
nated snapshot volumes as a functional volume directly 40 
addressable by data processor 2 using the file server 
system utilities. Once these snapshot volumes are 
mounted, data processor 2 can retrieve the data con- 
tained therein and transmit the retrieved data to_t he 
archive memory thafis directly connected to data pro- 45 
ces sor 2. 

> second application of the copies of a snapshot ap--^ 
^ , plication data group is in the recovery of an application 
job failure wherein the data can be corrupted during ; 
processing. It is a common practice in batch processing f 0 
jobs to make a copy of the critical data sets prior to. 
update processing in the data processor 2 so that if the 
processing fails the copy of the data sets can be restored 
and used to replace the corrupted data on data proces- 
sor 2. In this application, the user application program 3 J>5 
transmits commands to the file server system utility • 
immediately prior to transmission of the data to the 
batch processing program on data processing 2, to cre- 
ate a snapshot copy of the data that will be used in the 
batch processing. The snapshot co py o peration is ac- 60 
complished as described above and if there is a fail ure in \ 
the batch processing, the customer application' program 
3 can transmit commands to the file server system utility 
to request that the s napshot volum es be mounted in 
place of the source volumes to effectively delete all data 65 
on the source volumes an4 replace thenVw fch the data 
from the snapsho_t_volumes. The meta data is also re- 
stored in this process and the customer has some admin- 



istrative issues to address with regard to maintaining the 
meta data from the snapshot application data group 
instance that contained the corrupted data. 

A third application of copies of a snapshot application 
data group is for the movement of data sets from one 
volume in file server system 1 to another volume that is 
external to file server system 1 or even to another vol- 
ume within file server system 1 where file server system 
1 is a hierarchical data storage system. The movement 
of data sets from one volume to another is typically for 
performance reasons to place -the data sets on an appro- 
priate media that corresponds to the needs of data pro- 
cessor 2 and the user application programs 3. The trans- 
mission of the data sets from one volume to another can 
be accomplished through data processor 2 or can be 
done internally in file server system 1, depending on the 
location of the target volume. 

While a specific embodiment of this invention has 
been disclosed herein, it is expected that those skilled in 
the art can design other embodiments that differ from 
this particular embodiment but fall within the scope of 
the appended claims. 

We claim: 

1. A file server system for storing data sets for at least 
one data processor comprising: 

a plurality of data storage volumes, each of which is 
capable of storing at least one data set received 
from a data processor; 

means for maintaining data set pointers indicative of a 
set of data sets managed as a single data entity 
consisting of a plurality of interrelated ones of said 
data sets stored in first available memory space in a 
plurality of said data storage volumes in said file 
server system; 

means, responsive to the subsequent receipt of a data 
set access request from a data processor identifying 
one of said data sets stored in said set of data sets, 
for creating a new version of said set of data sets 
that contains said requested data set, independent 
of said data set requesting data processor, includ- 
ing: 

means for identifying a physical memory location 
of each data set in said set of data sets that con- 
tains said requested data set as specified by its 
data set pointer, 

means for generating a new data set pointer, dupli- 
cative of said data set pointer, as the data set 
pointer for said copy of each said data set in said 
set of data sets, 

means for maintaining data indicative of a corre- 
spondence between said data set pointers and 
said duplicative data set pointers, and 

means for providing said data set requesting data 
processor with access to said set of data sets via 
said duplicative data set pointers. 

2. The file server system of claim 1 wherein said 
creating means further comprises: 

means, responsive to. said data set requesting data 
processor modifying data in said requested data set, 
for copying said requested data set to second avail- 
able memory space in another of said plurality of 
data storage volumes; 

means for modifying said duplicative data set pointer 
to identify said second available memory space as a 
copy of said requested data set; and 

means for updating said data set pointer to indicate 
that said data set, stored in said physical memory 
location is a prior version of said requested data set. 
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3. The file server system of claim 2 wherein said means, responsive to said file server system archiving 
copying means copies a one of said plurality of said data a set of data sets identified by one of said set of data 
storage volumes containing said requested data set to sets pointers in archive memory means, for deleting 
second available memory space in another of said plu- the set of data sets pointer corresponding to said 
rality of data storage volumes. 5 archived set of data sets from said series of set of 

4. The file server system of claim 1 wherein said data sets pointers. 

creating means is further responsive to a subsequent H- The file server system of claim 10 further corn- 
receipt of another data set access request from a data prising: 

processor identifying one of said data sets stored in said means, responsive to said rewriting means, for storing 

set of data sets and said duplicative set of data sets, for 10 data indicative of the memory location in said ar- 

cr eating a third version of said requested data set, inde- chive memory means in -which said archived set of 

pendent of said subsequent data set access requesting data scts & stored. 

data processor. 12. The file server system of claim 10 further com- 

5. The file server system of claim 1 further compris- prising: 

ing. 15 means for determining the amount of said available 

means for storing threshold data indicative of the memory space in said data storage volumes; and 

number of prior versions of each set of data sets means for rescttm g ^ d threshold as a function of said 

that can be stored in said file server system. determined available memory space. 

6. The file server system of claim 5 further compris- 13 171(5 me server svstem of claim 1 compris- 
ing: 20 m S : 

means for creating a series of said data set pointers, means for stonng data received from a user of said file 

said series being indicative of a time ordered se- se ™* system mdicaave of the identity of each of 

quence of prior versions of a particular set of data ****** SetS m SCt ° f « . 

sets 14. The file server system of claim 1 wherein said 

7. The file server system of claim 6 further compris- 25 *^aining means operates independent of said at least 
• r one data processor. 

" „ . c . j v 15. The file server system of claim 1 wherein said 

means, responsive to a generation of a next duplica- . . . . J . 

tive data set pointer for an identified set of data ""Hf ""S , . . , t 

sets, for inserting said next duplicative data set „ ""T 8 . f« ^ I^J^ of data storage 

pointer into a one of said series of data set pointers 30 ,tT^h TTf ^ ^' 

corresponding to said series associated Jith said S ™™° f S 0rage ™ Iume * 

identified set of data sets. Son 2 

8. The file server system of claim 7 further compris- means for ' presenting a data storage image of a se- 
m ^ . . „ 35 lected one of said plurality of virtual data storage 

means, responsive to said file server system wntmg a volumes to each of said at least one data processor, 

set of data sets identified by one of said duplicative 16 . ^ flle sXem of claim 15 ^ 

data set pointers in a selected available memory presentmg means transforms the format of said set of 

space, for deleting an oldest data set pointer from data sets contail ^ lg M reque sted data ^ prior to 

said series of set of data sets pomters when the 40 enabling ^ cess to said requested data set by said data 

number of pomters in said series exceeds said set requesting data processor. 

n H?f eS ^? 1<± „ * . „ * 17. The file server system of claim 15 wherein said 

9. The file server system of claim 8 further compns- creating means further comprises: 

m S : , means for transferring said set of data sets containing 

cache memory means connected to and interconnect- 45 said requested data set from a first data storage 

ing said data processor and said data storage vol- volume to a second data storage volumej wherein 

umes for stonng data sets transmitted therebe- said ^ ^ second data storage volumes have 

tw . een > different physical data storage characteristics, 

archive memory means connected to said cache ig. The file server system of claim 1 wherein said 

memory means for stonng data sets that were pre- 50 maintaining means comprises: 

viously stored in said file server system by said data means for configuring said plurality of data storage 

processor; devices into m virtual data storage volumes, each 

means, responsive to the writing of a set of data sets Q f said virtual data storage volumes being capable 

in a selected available memory space, for compar- of storing at least one data set thereon, wherein m 

ing the number of set of data sets pointers in said 55 is a positive integer greater than one; and 

series of set of data sets pointers, corresponding to means for presenting a data storage image of n data 

said written set of data sets, to said threshold; storage volumes directly addressable by said at 

means, responsive to said number of set of data sets least one data processor, to said at least one data 

pointers exceeding said threshold, for signifying an processor, wherein n is a positive integer greater 

oldest set of data sets pointer in said series of set of 60 than zero and less than m. 

data sets pointers as archivable; 19. The file server system of claim 18 wherein said 

means for transferring said archivable set of data sets rnaintaining means further comprises: 

from said data storage volumes to said cache mem- means, responsive to a one of said at least one data 

ory means; and processor requesting access to a selected data set in 

means for rewriting said cached archivable set of data 65 a set of data sets that is stored on a one of said m 

sets into said archive memory means. data storage volumes that is not directly address- 

10. The file server system of claim 9 further compris- able by said at least one data processor, for mount- 
ing 1 ing said set of data sets containing said requested 
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data set on a one of said n directly addressable data copying, in response to said data set requesting data 

storage volumes. processor modifying data in said requested data set, 

20. The file server system of claim 19 wherein said said requested data set to second available memory 
maintaining means further comprises: space in another of said plurality of data storage 

means for transferring said set of data sets containing 5 volumes; 

said requested data set from a first data storage modifying said duplicative data set pointer to identify 

volume having a first set of physical data storage said second available memory space as a copy of 

characteristics to a second data storage volume said requested data set; and 

having a second set of physical data storage char- updating said data set pointer to indicate that said 

acteristics. 10 data set, stored in said physical memory location is 

21. The file server system of claim 1 wherein said a pr ip r version of said requested data set 
plurality of data storage volumes comprises: 24. The method of claim 23 wherein said step of 

a plurality of disk drives for storing data thereon, a copying copies a one of said plurality of said data stor- 
number of said disk drives being configured into at age volumes containing said requested data set to sec- 
least two redundancy groups, each said redun- 15 C nd available memory space in another of said plurality 
dancy group including n+m of said plurality of of ^ storage volumes. 

disk drives, where n and m are both positive into- 25 ^ meth od of claim 22 wherein said step of cre- 

gers with n greater than 1 and m at least equal to 1; atin ^ responsivc to a subse quent receipt of 

means for storing each of a plurality of data sets re- datft ^ ^ ff0m a ^ 

ceived I from said data processor on successive ones 20 Mentifyin one of ^ data ^ stored m ^ d ^ of data 

of said n disk drives in a selected redundancy ^ ^ ^ duplicative set of data ^ for creating a 

group, ■ J third version of said requested data set, independent of 

means, res^stye to said stomg means storing data ^ fe ^ J requesting data proces- 

sets on all n disk drives m said selected redundancy ^ H 6 * 

group, for generating m segments of data redun- 25 so ^' , . , . r A 

dancy information for said data sets stored on said J^J 1 * mcthod of claun 22 further «"nP™mg the 

melri^ n wrking said m segments of redundancy storin S threshold data indicative of the number of 

data on to said m disk drives of said selected redun- pnor versions of each set of data sets that can be 

dancy group- and 30 stored m 531(1 fde server svstcra * 

means, responsive to said writing means, for general- 21 ' J** mcthod of claim 26 comprising the 

ing a data set pointer for each of said data sets stc P °* : 

stored on said n disk drives identifying the physical creating a series of said data set pointers, said series 

memory location of each said data set in said redun- ^lag indicative of a time ordered sequence of pnor 

dancy group. 35 versions of a particular set of data sets. 

22. In a file server system having a plurality of data M - The method of claim 27 further comprising the 
storage volumes, each of which is capable of storing at ste P 

least one data set received from a data processor, a inserting, in response to a generation of a next dupli- 

method for storing data sets for at least one data proces- cative data set pointer for an identified set of data 

sor, comprising the steps of: 40 sets ' f^d next duplicative data set pointer into a one 

maintaining data set pointers indicative of a set of of said series of data set pointers corresponding to 

data sets managed as a single data entity consisting said series associated with said identified set of data 

of a plurality of interrelated ones of said data sets se ^ & - 

stored in first available memory space in a plurality 29* The method of claim 28 further comprising the 

of said data storage volumes in said file server 45 of: 

system; deleting, in response to said file server system writing 
creating, in response to the subsequent receipt of a a set of data sets identified by one of said duplica- 
data set access request from a data processor identi- tive data set pointers in a selected available mem- 
fying one of said data sets stored in said set of data ory space, an oldest data set pointer from said series 
sets, a new version of said set of data sets that con- 50 of set of data sets pointers when the number of 
tains said requested data set, independent of said pointers in said series exceeds said threshold, 
data set requesting data processor, including: 30. The method of claim 29, wherein said file server 
identifying a physical memory location of each system includes a cache memory connected to and in- 
data set in said set of data sets that contains said terconnecting said data processor and said data storage 
requested data set as specified by its data set 55 volumes for storing data sets transmitted therebetween 
pointer, and an archive memory connected to said cache mem- 
generating a new data set pointer, duplicative of ory for storing data sets that were previously stored in 
said data set pointer, as the data set pointer for said file server system by said data processor, the 
said copy of each said data set in said set of data method further comprising the steps of: 
sets, 60 comparing, in response to the writing of a set of data 
maintaining data indicative of a correspondence sets in a selected available memory space, the num- 
between said data set pointers and said duplica- ber of set of data sets pointers in said series of set of 
tive data set pointers, and data sets pointers, corresponding to said written set 
providing said data set requesting data processor of data sets, to said threshold; 

with access to said set of data sets via said dupli- 65 signifying, in response to said number of set of data 

cative data set pointers. sets pointers exceeding said threshold, an oldest set 

23. The method of claim 22 wherein said step of ere- of data sets pointer in said series of set of data sets 
ating further comprises: pointers as archivable; 
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transferring said archivable set of data sets from said 
data storage volumes to said cache memory; and 

rewriting said cached archivable set of data sets into 
said archive memory. 

31. The method of claim 30 further comprising the * 
step of: 

deleting, in response to said file server system archiv- 
ing a set of data sets identified by one of said set of 
data sets pointers in archive memory, the set of 
data sets pointer corresponding to said archived set 
of data sets from said series of set of data sets point- 
ers. 

32. The method of claim 31 further comprising the 
step of: 

storing, in response to said step of rewriting, data 
indicative of the memory location in said archive 
memory in which said archived set of data sets is 
stored. 

33. The method of claim 31 further comprising the 20 
steps of: 

determining the amount of said available memory 
space in said data storage volumes; and 

resetting said threshold as a function of said deter- 
mined available memory space. 25 

34. The method of claim 22 further comprising the 
step of: 

storing data received from a user of said file server 
system indicative of the identity of each of said data 
sets in said set of data sets. 30 

35. The method of claim 22 wherein said step of main- 
taining operates independent of said at least one data 
processor. 

36. The method of claim 22 wherein said step of main- 
taining comprises: 35 

configuring said plurality of data storage devices into 
a plurality of virtual data storage volumes, each of 
said virtual data storage volumes being capable of 
storing at least one data set thereon; and 

presenting a data storage image of a selected one of 
said plurality of virtual data storage volumes to 
each of said at least one data processor. 

37. The method of claim 36 wherein said step of pres- 
enting transforms the format of said set of data sets 45 
containing said requested data set prior to enabling 
access to said requested data set by said data set request- 
ing data processor. 

. 38. The method of claim 36 wherein said step of cre- 
ating further comprises: SO 
transferring said set of data sets containing said re- 
quested data set from a first data storage volume to 
a second data storage volume, wherein said first 



and second data storage volumes have different 
physical data storage characteristics. 

39. The method of claim 22 wherein said step of main- 
taining comprises: 

configuring said plurality of data storage devices into 
m virtual data storage volumes, each of said virtual 
data storage volumes being capable of storing at 
least one data set thereon, wherein m is a positive 
integer greater than one; and 

presenting a data storage- image of n data storage 
volumes directly addressable by said at least one 
data processor, to said at least one data processor, 
wherein n is a positive integer greater than zero 
and less than m. 

40. The method of claim 39 wherein said step of main- 
taining further comprises: 

mounting, in response to a one of said at least one data 
processor requesting access to a selected data set in 
a set of data sets that is stored on a one of said m 
data storage volumes that is not directly address- 
able by said at least one data processor, said set of 
data sets containing said requested data set on a one 
of said n directly addressable data storage volumes. 

41. The method of claim 40 wherein said step of main- 
taming further comprises: 

transferring said set of data sets containing said re- 
quested data set from a first data storage volume 
having a first set of physical data storage character- 
istics to a second data storage volume having a 
second set of physical data storage characteristics. 

42. The method of claim 22 wherein said plurality of 
data storage volumes comprises a plurality of disk 
drives for storing data thereon, a number of said disk 
drives being configured into at least two redundancy 
groups, each said redundancy group including n+m of 
said plurality of disk drives, where n and m are both 
positive integers with n greater than 1 and m at least 
equal to 1, said method further comprises the steps of: 

storing each of a plurality of data sets received from 
said data processor on successive ones of said n disk 
drives in a selected redundancy group; 

generating, in response to said step of storing data sets 
on all n disk drives in said selected redundancy 
group, m segments of data redundancy information 
for said data sets stored on said n disk drives; 

writing said m segments of redundancy data on to 
said m disk drives of said selected redundancy 
group; and 

generating, in response to said step of writing, a data 
set pointer for each of said data sets stored on said 
n disk drives identifying the physical memory loca- 
tion of each said data set in said redundancy group. 
***** 
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